We are strong supporters of open-source tools for reproducible research. Priority index or Pi is developed as a genomic-led target prioritisation system, with the focus on leveraging genetic data to prioritise targets at the gene and pathway level. Based on evidence of genetic association from GWAS data, this prioritisation system is able to resolve modulated genes (seed genes) by utilising knowledge of linkage disequilibrium (co-inherited variants), distance from the gene, and evidence of genetic association with gene expression. Seed genes are scored in an integrative way quantifying the genetic influence. Scored seed genes are subsequently used as baits to rank seed genes plus additional (non-seed) genes; this is achieved by iteratively exploring the global connectivity of a gene interaction network. Genes with the top priority are further used to identify/prioritise pathways that are significantly enriched with highly prioritised genes. Prioritised genes are also used to identify a gene network interconnecting many highly prioritised genes but a few lowly prioritised genes as linkers.
In the Showcases section, we illustrate the power of Pi to perform disease-centric genetic target prioritisation at the gene, pathway and network level using several inflammatory diseases as exemplars, including Ankylosing Spondylitis, Spondyloarthritis, and Systemic Lupus Erythematosus.
In the Ontology annotations & TPN nominations section, we integrate ontology annotations and TPN nominations as an additional support for disease-specific genetic prioritisation by Pi.
In the Cross-disease comparisons section, we summarise and compare results of genetic target prioritisation across diseases, done at the gene, pathway and network level.
A web-based interface of Pi is available here, which allows on-the-fly prioritisation for a user-defined SNP list.
This intends for end-users who are comfortable with R (with the latest version available at http://cran.r-project.org). Priority functions are all contained in a new package called Pi which can be installed following instructions below:
# install the dependant packages including `XGR` and other packages used here
source("http://bioconductor.org/biocLite.R")
biocLite(c("XGR","devtools","VennDiagram"), siteRepos=c("http://cran.r-project.org"))
library(devtools)
install_github(c("hfang-bristol/XGR", "hfang-bristol/dnet"), dependencies=T)
# also, install the `Pi` package itself
library(devtools)
install_github(c("hfang-bristol/Pi"), dependencies=T)
Priority functions are designed in a nested way. The core relation follows this route: xPrioritiserSNPs -> xPrioritiserGenes -> xPrioritiser -> xRWR, achieving gene-level prioritisation from an input list of SNPs (along with their significant level). The output of this route is taken as the input of either xPrioritiserManhattan for visualising gene priority scores, xPrioritiserPathways for prioritising pathways, or xPrioritiserSubnet for identifying a network of top prioritised genes.
xRWR: implements Random Walk with Restart (RWR) estimating the affinity score of nodes in a graph to a list of seed nodes. The affinity score can be viewed as the influential impact over the graph that is collectively imposed by seed nodes. If the seeds are not given, it will pre-compute affinity matrix for nodes in the input graph with respect to each starting node (as a seed) by looping over every node in the graph. A higher-level function xPrioritiser directly relies on it.
xPrioritiser: uses RWR to calculate the affinity score of nodes in a graph to a list of seed nodes. A node that has a higher affinity score to seed nodes will receive a higher priority score. It is an internal function acting as a general template for RWR-based prioritisation. A higher-level function xPrioritiserGenes directly relies on it.
xPrioritiserGenes: prioritises gene targets from an input gene interaction network and the score info imposed on its seed nodes/genes. This function can be considered to be a specific instance of xPrioritiser, that is, specifying a gene interaction network as a graph and seed nodes as seed genes.
There are two sources of gene interaction network information: the STRING database (Szklarczyk et al. 2015) and the Pathways Commons database (Cerami et al. 2011). STRING is a meta-integration of undirect interactions from a functional aspect, while Pathways Commons mainly contains both undirect and direct interactions from a physical/pathway aspect. In addition to interaction type, users can choose the interactions of varying quality:
| Code | Interaction (type and quality) | Database |
|---|---|---|
| STRING_high | Functional interactions (with high confidence scores>=700) | STRING |
| STRING_medium | Functional interactions (with medium confidence scores>=400) | STRING |
| PCommonsUN_high | Physical/undirect interactions (with references & >=2 sources) | Pathways Commons |
| PCommonsUN_medium | Physical/undirect interactions (with references & >=1 sources) | Pathways Commons |
| PCommonsDN_high | Pathway/direct interactions (with references & >=2 sources) | Pathways Commons |
| PCommonsDN_medium | Pathway/direct interactions (with references & >=1 sources) | Pathways Commons |
xPrioritiserSNPs: prioritises gene targets from an input gene interaction network and a given list of SNPs together with the significance level (eg GWAS reported p-values). To do so, it first defines seed genes and their scores that are calculated in an integrative manner to quantify the genetic influence under SNPs. Genetic influential score on a seed gene is calculated from the SNP score (reflecting the SNP significant level), the gene-to-SNP distance weight and the eQTL mapping weight. This function can be considered to be a specific instance of xPrioritiserGenes, that is, specifying seed genes plus their scores.
Knowledge of co-inherited variants is also used to include additional SNPs that are in Linkage Disequilibrium (LD) with GWAS lead SNPs. LD SNPs are calculated based on 1000 Genomes Project data (1000 Genomes Project Consortium 2012). LD SNPs are defined to be any SNPs having R2 > 0.8 with GWAS lead SNPs. The user can choose the population used to calculate LD SNPs:
| Code | Population | Project |
|---|---|---|
| AFR | African | 1000 Genomes Project |
| AMR | Admixed American | 1000 Genomes Project |
| EAS | East Asian | 1000 Genomes Project |
| EUR | European | 1000 Genomes Project |
| SAS | South Asian | 1000 Genomes Project |
xPrioritiserManhattan: visualises prioritised genes using manhattan plot, in which priority for genes is displayed on the Y-axis along with genomic locations on the X-axis. Also highlighted are genes with the top priority.
xPrioritiserPathways: prioritises pathways based on enrichment analysis of genes with the top priority (eg top 100 genes) using a compendium of pathways. A highly prioritised pathway has its member genes with high gene-level priority scores, that is, having evidence of direct modulation by disease-associated lead (or LD) SNPs, and/or having evidence of indirect modulation at the network level.
In addition to pathways, enrichment analysis can be extended to other ontologies, as listed below:
| Code | Ontology | Category |
|---|---|---|
| DO | Disease Ontology | Disease |
| GOMF | Gene Ontology Molecular Function | Function |
| GOBP | Gene Ontology Biological Process | Function |
| GOCC | Gene Ontology Cellular Component | Function |
| HPPA | Human Phenotype Phenotypic Abnormality | Phenotype |
| HPMI | Human Phenotype Mode of Inheritance | Phenotype |
| HPCM | Human Phenotype Clinical Modifier | Phenotype |
| HPMA | Human Phenotype Mortality Aging | Phenotype |
| MP | Mammalian/Mouse Phenotype | Phenotype |
| DGIdb | DGI druggable gene categories | Druggable |
| SF | SCOP domain superfamilies | Domain |
| PS2 | phylostratific age information (our ancestors) | Evolution |
| MsigdbH | Hallmark gene sets | Hallmark (MsigDB) |
| MsigdbC1 | Chromosome and cytogenetic band positional gene sets | Cytogenetics (MsigDB) |
| MsigdbC2CGP | Chemical and genetic perturbation gene sets | Perturbation (MsigDB) |
| MsigdbC2CPall | All pathway gene sets | Pathways (MsigDB) |
| MsigdbC2CP | Canonical pathway gene sets | Pathways (MsigDB) |
| MsigdbC2KEGG | KEGG pathway gene sets | Pathways (MsigDB) |
| MsigdbC2REACTOME | Reactome pathway gene sets | Pathways (MsigDB) |
| MsigdbC2BIOCARTA | BioCarta pathway gene sets | Pathways (MsigDB) |
| MsigdbC3TFT | Transcription factor target gene sets | TF targets (MsigDB) |
| MsigdbC3MIR | microRNA target gene sets | microRNA targets (MsigDB) |
| MsigdbC4CGN | Cancer gene neighborhood gene sets | Cancer (MsigDB) |
| MsigdbC4CM | Cancer module gene sets | Cancer (MsigDB) |
| MsigdbC5BP | GO biological process gene sets | Function (MsigDB) |
| MsigdbC5MF | GO molecular function gene sets | Function (MsigDB) |
| MsigdbC5CC | GO cellular component gene sets | Function (MsigDB) |
| MsigdbC6 | Oncogenic signature gene sets | Oncology (MsigDB) |
| MsigdbC7 | Immunologic signature gene sets | Immunology (MsigDB) |
xPrioritiserSubnet: identifies a gene network that contains as many highly prioritised genes as possible but a few lowly prioritised genes as linkers. An iterative procedure of choosing different priority cutoff is also used to identify the network with a desired number of nodes/genes.
In this section, we give a step-by-step demo of using Pi to carry out disease-specific genetic prioritisation of targets at the gene and pathway level, focusing on three inflammatory diseases as exemplars: Ankylosing Spondylitis (AS), Spondyloarthritis (SA, including AS and Psoriatic Arthritis), and Systemic Lupus Erythematosus (SLE).
First of all, load the packages including Pi:
library(Pi)
# Following packages are also needed
library(XGR)
library(RCircos)
library(VennDiagram)
# optionally, specify the local location of built-in RData
# by default:
RData.location <- "https://github.com/hfang-bristol/RDataCentre/blob/master/Portal"
GWAS SNPs are collected mainly from GWAS Catalog (Welter et al. 2014), complemented by ImmunoBase and latest publications.
Import AS-associated GWAS lead SNPs (with the help of Anna Sanniti)
data.file <- "http://galahad.well.ox.ac.uk/bigdata/AS.txt"
AS <- read.delim(data.file, header=TRUE, stringsAsFactors=FALSE)
The first 5 rows of the data frame AS are shown below, with the column SNP for AS GWAS SNPs and the column Pval for GWAS-detected P-values.
| SNP | Pval |
|---|---|
| rs10045403 | 5.8e-14 |
| rs1018326 | 2.0e-06 |
| rs10440635 | 3.0e-07 |
| rs10781500 | 1.0e-06 |
| rs10865331 | 1.9e-19 |
It includes the following steps:
Define seed genes based on distance-to-SNP window and genetic association with gene expression: nearby genes that are located within 200kb distance window of lead or Linkage Disequilibrium (LD) SNPs that are calculated based on European population data from 1000 Genome Project; expression associated genes (eQTL genes) for which gene expression is, either in a cis- or trans-acting manner, significantly associated with lead or LD SNPs, based on immune-stimulated eQTL data in monocytes (Fairfax et al. 2014).
Score seed genes to quantify the genetic influence under lead or LD SNPs.
Prioritise target genes by estimating their global network connectivity to seed genes. With scored seed genes superposed onto a gene interaction network (see above STRING_high), RWR algorithm is implemented to prioritise candidate target genes based on their network connectivity/affinity to seed genes. As such, a gene that has a higher affinity score to seed genes will receive a higher priority score.
pAS <- xPrioritiserSNPs(data=AS, include.LD="EUR", include.eQTL=c("JKscience_TS2B","JKscience_TS3A"), network="STRING_high", significance.threshold=5e-5, distance.max=200000, decay.kernel="rapid", restart=0.75, RData.location=RData.location)
The results are stored in the data frame pAS$priority, which can be saved into a file AS.genes_priority.txt:
AS_genes <- pAS$priority
write.table(AS_genes, file="AS.genes_priority.txt", sep="\t", row.names=F)
The top 20 genes are listed below:
| name | seed | weight | priority | rank |
|---|---|---|---|---|
| CAST | 1 | 52.13154 | 0.08460 | 1 |
| ERAP2 | 1 | 22.33534 | 0.03626 | 2 |
| ERAP1 | 1 | 20.02777 | 0.03276 | 3 |
| B3GNT2 | 1 | 12.76706 | 0.02077 | 4 |
| IL23R | 1 | 11.19898 | 0.01823 | 5 |
| LNPEP | 1 | 10.36132 | 0.01696 | 6 |
| DDX39B | 1 | 7.91010 | 0.01284 | 7 |
| IL12RB2 | 1 | 7.32161 | 0.01197 | 8 |
| ETS2 | 1 | 7.14037 | 0.01160 | 9 |
| PSMG1 | 1 | 6.69782 | 0.01091 | 10 |
| TBKBP1 | 1 | 6.58648 | 0.01071 | 11 |
| NFKBIL1 | 1 | 6.00768 | 0.01039 | 12 |
| KIF21B | 1 | 5.82870 | 0.01008 | 13 |
| GPR65 | 1 | 5.97141 | 0.00969 | 14 |
| C1orf106 | 1 | 5.21012 | 0.00854 | 15 |
| IL6R | 1 | 4.57166 | 0.00747 | 16 |
| TNFRSF1A | 1 | 4.24279 | 0.00717 | 17 |
| NPEPPS | 1 | 4.12215 | 0.00670 | 18 |
| KPNB1 | 1 | 4.09678 | 0.00666 | 19 |
| GALC | 1 | 4.01292 | 0.00654 | 20 |
These top 20 genes are also highlighted in manhattan plot, in which priority scores for genes are displayed on the Y-axis along with genomic locations on the X-axis.
mp_AS <- xPrioritiserManhattan(pAS, highlight.top=20, cex=0.5, highlight.col="red", highlight.label.size=3, RData.location=RData.location)
print(mp_AS)
Pathway-level prioritisation is based on top 100 genes using a compendium of pathways from diverse sources (Canonical, KEGG, BioCarta and Reactome).
eTerm <- xPrioritiserPathways(pNode=pAS, priority.top=100, ontology="MsigdbC2CPall", RData.location=RData.location)
# view the top pathways/terms
xEnrichViewer(eTerm)
# save results to a file `AS.pathways_priority.txt`
AS_pathways <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
output <- data.frame(term=rownames(AS_pathways), AS_pathways)
write.table(output, file="AS.pathways_priority.txt", sep="\t", row.names=F)
The top 5 prioritised pathways are shown below:
| name | nAnno | nOverlap | zscore | pvalue | adjp | members |
|---|---|---|---|---|---|---|
| IL27-mediated signaling events | 26 | 5 | 11.00 | 2.7e-08 | 6.3e-07 | IL12B, TYK2, IL12RB2, TBX21, IL27 |
| IL12-mediated signaling events | 63 | 7 | 9.63 | 1.6e-08 | 6.3e-07 | NFKB1, IL12B, TYK2, IL12RB2, IL2RA, NOS2, TBX21 |
| NO2-dependent IL 12 Pathway in NK cells | 17 | 4 | 11.00 | 1.1e-07 | 1.7e-06 | IL12B, TYK2, IL12RB2, NOS2 |
| IL23-mediated signaling events | 37 | 5 | 9.08 | 2.5e-07 | 3.0e-06 | NFKB1, IL12B, TYK2, NOS2, IL23R |
| Cytokine-cytokine receptor interaction | 266 | 11 | 6.55 | 4.6e-07 | 4.3e-06 | CSF2RB, TNFRSF1A, IL12B, IL6R, IL12RB2, IL2RA, CXCR2, LTBR, IL23R, IL1R2, CD27 |
This barplot displays the top 5 prioritised pathways/terms, where a vertical line indicates the adjusted p-value cutoff at 0.05.
df <- AS_pathways[1:5,]
df$name <- factor(df$name, levels=df$name)
p <- ggplot(df, aes(x=name, y=-1*log10(adjp)))
p + geom_bar(stat="identity", fill="deepskyblue") + ylab("Priority scores: -log10(adjp)") + theme(axis.title.y=element_blank(), axis.text.y=element_text(size=12,color="blue"), axis.title.x=element_text(size=14,color="blue")) + geom_hline(yintercept=-log10(0.05),color="blue") + coord_flip()
Similarly, look at enrichments for top genes from different aspects:
eTerm <- xPrioritiserPathways(pNode=pAS, priority.top=100, ontology="SF", RData.location=RData.location)
AS_sf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
The top 8 significant SCOP domains are shown below:
| name | nAnno | nOverlap | zscore | pvalue | adjp | members |
|---|---|---|---|---|---|---|
| Leukotriene A4 hydrolase N-terminal domain | 12 | 4 | 15.30 | 3.6e-09 | 4.3e-08 | LNPEP, ERAP2, NPEPPS, ERAP1 |
| TNF receptor-like | 24 | 3 | 7.87 | 8.6e-06 | 5.2e-05 | TNFRSF1A, CD27, LTBR |
| p53-like transcription factors | 43 | 3 | 5.67 | 9.2e-05 | 3.7e-04 | NFKB1, RUNX3, TBX21 |
| Metalloproteases (zincins), catalytic domain | 100 | 4 | 4.65 | 2.3e-04 | 6.3e-04 | LNPEP, ERAP2, NPEPPS, ERAP1 |
| Histone-fold | 107 | 4 | 4.44 | 3.2e-04 | 6.3e-04 | HIST1H4I, HIST1H3A, HIST1H3C, HIST1H3B |
| Immunoglobulin | 481 | 9 | 3.95 | 3.1e-04 | 6.3e-04 | TAPBPL, IL1R2, ICAM3, ICAM4, FCGR2A, FCGR2B, FCGR3A, IL6R, IL12B |
| Fibronectin type III | 200 | 5 | 3.72 | 8.5e-04 | 1.5e-03 | IL6R, IL12B, IL12RB2, CSF2RB, IL23R |
| DEATH domain | 87 | 3 | 3.64 | 1.4e-03 | 2.1e-03 | NFKB1, CARD9, TNFRSF1A |
eTerm <- xPrioritiserPathways(pNode=pAS, priority.top=100, ontology="GOMF", RData.location=RData.location)
AS_gomf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
The top 10 significant GO molecular function terms are shown below:
| name | nAnno | nOverlap | zscore | pvalue | adjp | members |
|---|---|---|---|---|---|---|
| Leukotriene A4 hydrolase N-terminal domain | 12 | 4 | 15.300 | 3.6e-09 | 4.3e-08 | LNPEP, ERAP2, NPEPPS, ERAP1 |
| TNF receptor-like | 24 | 3 | 7.870 | 8.6e-06 | 5.2e-05 | TNFRSF1A, CD27, LTBR |
| p53-like transcription factors | 43 | 3 | 5.670 | 9.2e-05 | 3.7e-04 | NFKB1, RUNX3, TBX21 |
| Metalloproteases (zincins), catalytic domain | 100 | 4 | 4.650 | 2.3e-04 | 6.3e-04 | LNPEP, ERAP2, NPEPPS, ERAP1 |
| Histone-fold | 107 | 4 | 4.440 | 3.2e-04 | 6.3e-04 | HIST1H4I, HIST1H3A, HIST1H3C, HIST1H3B |
| Immunoglobulin | 481 | 9 | 3.950 | 3.1e-04 | 6.3e-04 | TAPBPL, IL1R2, ICAM3, ICAM4, FCGR2A, FCGR2B, FCGR3A, IL6R, IL12B |
| Fibronectin type III | 200 | 5 | 3.720 | 8.5e-04 | 1.5e-03 | IL6R, IL12B, IL12RB2, CSF2RB, IL23R |
| DEATH domain | 87 | 3 | 3.640 | 1.4e-03 | 2.1e-03 | NFKB1, CARD9, TNFRSF1A |
| Homeodomain-like | 294 | 3 | 1.080 | 8.1e-02 | 1.1e-01 | NKX2-3, MIER1, SNAPC4 |
| PH domain-like | 420 | 3 | 0.442 | 2.1e-01 | 2.5e-01 | TYK2, PLEKHG6, SH2B3 |
Network-level prioritisation is to identify a gene network that contains as many highly prioritised genes as possible but a few lowly prioritised genes as linkers. Given a gene network (the same as used in gene-level prioritisation) with nodes labelled with gene priority scores, the search for a maximum-scoring gene subnetwork is briefed as follows:
score transformation, that is, given a threshold of tolerable priority scores, nodes above this threshold (nodes of interest) are scored positively, and negative scores for nodes below the threshold (intolerable).
subnetwork identification, that is, to find an interconnected gene subnetwork enriched with positive-score nodes but allowing for a few negative-score nodes as linkers, done via heuristically solving prize-collecting Steiner tree problem (Fang and Gough 2014).
controlling the subnetwork size, that is, an iterative procedure of scanning different priority thresholds is used to identify the network with a desired number of nodes/genes.
Notably, the preferential use of the same network as used in gene-level prioritisation is due to the fact that gene-level affinity/priority scores are smoothly distributed over the network after being walked. In other words, the chance of identifying such a gene network enriched with top prioritised genes is much higher. To reduce the runtime, by default only top 10% prioritised genes will be used to search for the maximum-scoring gene network.
# find maximum-scoring gene network with the desired node number=50
AS_net <- xPrioritiserSubnet(pNode=pAS, priority.quantite=0.1, subnet.size=50, RData.location=RData.location)
The identified gene network has nodes/genes colored according to their priority scores (see below). Notably, if nodes appear abnormally, please remove vertex.shape="sphere" when running the function xVisNet.
g <- AS_net
pattern <- as.numeric(V(g)$priority)
zmax <- ceiling(quantile(pattern,0.75)*1000)/1000
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="yr", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))
Circos plot of the gene network is shown below:
# remove chromosomes without any genes
xCircos(g, entity="Gene", top_num=ecount(g), entity.label.cex=0.5, chr.exclude="auto", RData.location=RData.location)
Look at enrichments for genes in the network:
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="MsigdbC2CPall", RData.location=RData.location)
g_path <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
Enrichment results for the top 10 significant pathways are shown below:
| name | nAnno | nOverlap | zscore | pvalue | adjp | members |
|---|---|---|---|---|---|---|
| IL12-mediated signaling events | 63 | 11 | 19.10 | 1.1e-16 | 6.7e-15 | NFKB1, LCK, B2M, IL12B, SOCS1, TYK2, IL12RB2, IL1R1, IL2RA, NOS2, TBX21 |
| IL27-mediated signaling events | 26 | 5 | 13.50 | 2.4e-09 | 7.0e-08 | IL12B, TYK2, IL12RB2, TBX21, IL27 |
| Cytokine-cytokine receptor interaction | 266 | 11 | 8.52 | 4.2e-09 | 7.0e-08 | CSF2RB, TNFRSF1A, IL12B, IL6R, IL12RB2, IL1R1, IL2RA, CXCR1, CXCR2, IL23R, IL1R2 |
| Cytokine Signaling in Immune system | 268 | 11 | 8.48 | 4.6e-09 | 7.0e-08 | CSF2RB, ICAM1, LCK, B2M, SOCS1, IL6R, TYK2, IL1R1, IL2RA, IL1R2, UBA52 |
| NO2-dependent IL 12 Pathway in NK cells | 17 | 4 | 13.50 | 1.5e-08 | 1.8e-07 | IL12B, TYK2, IL12RB2, NOS2 |
| IL23-mediated signaling events | 37 | 5 | 11.20 | 2.3e-08 | 2.4e-07 | NFKB1, IL12B, TYK2, NOS2, IL23R |
| Signaling by Interleukins | 106 | 7 | 8.95 | 4.3e-08 | 3.8e-07 | CSF2RB, LCK, IL6R, TYK2, IL1R1, IL2RA, IL1R2 |
| Jak-STAT signaling pathway | 155 | 8 | 8.28 | 5.6e-08 | 4.2e-07 | CSF2RB, IL12B, SOCS1, IL6R, TYK2, IL12RB2, IL2RA, IL23R |
| Hematopoietic cell lineage | 87 | 6 | 8.48 | 2.1e-07 | 1.4e-06 | ITGB3, IL6R, IL1R1, IL2RA, IL1R2, CD9 |
| Immune System | 912 | 16 | 5.66 | 7.6e-07 | 4.6e-06 | CSF2RB, ICAM1, LCK, B2M, SOCS1, IL6R, TYK2, IL1R1, IL2RA, ICAM3, IL1R2, LNPEP, UBA52, ICAM4, ERAP1, NPEPPS |
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="DGIdb", RData.location=RData.location)
g_dbi <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
Enrichment results for the top 5 significant druggable categories are shown below:
| name | nAnno | nOverlap | zscore | pvalue | adjp | members |
|---|---|---|---|---|---|---|
| Neutral zinc metallopeptidase | 182 | 4 | 3.56 | 0.00120 | 0.0033 | ERAP2, LNPEP, NPEPPS, ERAP1 |
| Cell surface | 467 | 7 | 3.50 | 0.00081 | 0.0033 | TNFRSF1A, B2M, SCNN1A, ICAM1, CD9, IL1R1, CXCR2 |
| External side of plasma membrane | 189 | 4 | 3.46 | 0.00140 | 0.0033 | B2M, SCNN1A, ICAM1, CD9 |
| Drug resistance | 349 | 5 | 2.80 | 0.00410 | 0.0071 | B2M, ICAM1, SOCS1, LCK, PDE4A |
| Protease | 622 | 5 | 1.37 | 0.05600 | 0.0790 | ERAP2, LNPEP, NPEPPS, ERAP1, CAPNS1 |
Import SA-associated GWAS lead SNPs (with the help of Anna Sanniti and Alicia Lledo Lara)
data.file <- "http://galahad.well.ox.ac.uk/bigdata/Spondyloarthritis.txt"
SA <- read.delim(data.file, header=TRUE, stringsAsFactors=FALSE)
pSA <- xPrioritiserSNPs(data=SA, include.LD="EUR", include.eQTL=c("JKscience_TS2B","JKscience_TS3A"), network="STRING_high", significance.threshold=5e-5, distance.max=200000, decay.kernel="rapid", restart=0.75, RData.location=RData.location)
The results are stored in the data frame pSA$priority, which can be saved into a file SA.genes_priority.txt:
SA_genes <- pSA$priority
write.table(SA_genes, file="SA.genes_priority.txt", sep="\t", row.names=F)
The top 20 genes are listed below:
| name | seed | weight | priority | rank |
|---|---|---|---|---|
| CAST | 1 | 52.13154 | 0.02679 | 1 |
| DDX39B | 1 | 43.38974 | 0.02230 | 2 |
| ATF6B | 1 | 37.30371 | 0.01918 | 3 |
| NFKBIL1 | 1 | 33.00396 | 0.01806 | 4 |
| IER3 | 1 | 22.88007 | 0.01174 | 5 |
| ERAP2 | 1 | 22.33534 | 0.01148 | 6 |
| ERAP1 | 1 | 20.02777 | 0.01037 | 7 |
| TRAF3IP2 | 1 | 20.03787 | 0.01030 | 8 |
| MDC1 | 1 | 19.39800 | 0.00998 | 9 |
| TNIP1 | 1 | 18.67828 | 0.00967 | 10 |
| ANXA6 | 1 | 17.98198 | 0.00925 | 11 |
| VARS2 | 1 | 17.75845 | 0.00913 | 12 |
| LCE3D | 1 | 14.03913 | 0.00903 | 13 |
| LCE3A | 1 | 13.93991 | 0.00849 | 14 |
| LCE3B | 1 | 13.99650 | 0.00818 | 15 |
| LCE3C | 1 | 12.22289 | 0.00802 | 16 |
| LCE3E | 1 | 12.45889 | 0.00751 | 17 |
| TYK2 | 1 | 12.86998 | 0.00672 | 18 |
| B3GNT2 | 1 | 12.76706 | 0.00657 | 19 |
| CNPY2 | 1 | 12.65046 | 0.00652 | 20 |
These top 20 genes are also highlighted in manhattan plot, in which priority scores for genes are displayed on the Y-axis along with genomic locations on the X-axis.
mp_SA <- xPrioritiserManhattan(pSA, highlight.top=20, cex=0.4, highlight.col="red", highlight.label.size=3, RData.location=RData.location)
print(mp_SA)
Pathway-level prioritisation is based on top 100 genes using a compendium of pathways from diverse sources (Canonical, KEGG, BioCarta and Reactome).
eTerm <- xPrioritiserPathways(pNode=pSA, priority.top=100, ontology="MsigdbC2CPall", RData.location=RData.location)
# view the top pathways/terms
xEnrichViewer(eTerm)
# save results to a file `SA.pathways_priority.txt`
SA_pathways <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
output <- data.frame(term=rownames(SA_pathways), SA_pathways)
write.table(output, file="SA.pathways_priority.txt", sep="\t", row.names=F)
The top 10 prioritised pathways are shown below:
| name | nAnno | nOverlap | zscore | pvalue | adjp | members |
|---|---|---|---|---|---|---|
| NO2-dependent IL 12 Pathway in NK cells | 17 | 4 | 12.30 | 3.5e-08 | 1.0e-06 | IL12B, TYK2, IL12RB2, NOS2 |
| IL23-mediated signaling events | 37 | 5 | 10.30 | 6.5e-08 | 1.0e-06 | IL12B, TYK2, NOS2, IL23A, IL23R |
| IL27-mediated signaling events | 26 | 4 | 9.84 | 3.6e-07 | 3.8e-06 | STAT2, IL12B, TYK2, IL12RB2 |
| IL12 and Stat4 Dependent Signaling Pathway in Th1 Development | 23 | 3 | 7.79 | 9.0e-06 | 7.2e-05 | IL12B, TYK2, IL12RB2 |
| Beta2 integrin cell surface interactions | 29 | 3 | 6.85 | 2.3e-05 | 1.5e-04 | ICAM1, ICAM3, ICAM4 |
| IL12-mediated signaling events | 63 | 4 | 5.97 | 3.3e-05 | 1.5e-04 | IL12B, TYK2, IL12RB2, NOS2 |
| Jak-STAT signaling pathway | 155 | 6 | 5.36 | 3.2e-05 | 1.5e-04 | STAT2, IL12B, TYK2, IL12RB2, IL23A, IL23R |
| Allograft rejection | 37 | 3 | 5.97 | 6.3e-05 | 2.2e-04 | IL12B, HLA-E, HLA-DQA1 |
| Viral myocarditis | 71 | 4 | 5.55 | 5.8e-05 | 2.2e-04 | FYN, ICAM1, HLA-E, HLA-DQA1 |
| Type I diabetes mellitus | 43 | 3 | 5.47 | 1.1e-04 | 3.7e-04 | IL12B, HLA-E, HLA-DQA1 |
This barplot displays the top 10 prioritised pathways/terms, where a vertical line indicates the adjusted p-value cutoff at 0.05.
df <- SA_pathways[1:10,]
df$name <- factor(df$name, levels=df$name)
p <- ggplot(df, aes(x=name, y=-1*log10(adjp)))
p + geom_bar(stat="identity", fill="deepskyblue") + ylab("Priority scores: -log10(adjp)") + theme(axis.title.y=element_blank(), axis.text.y=element_text(size=12,color="blue"), axis.title.x=element_text(size=14,color="blue")) + geom_hline(yintercept=-log10(0.05),color="blue") + coord_flip()
Similarly, look at enrichments for top genes from different aspects:
eTerm <- xPrioritiserPathways(pNode=pSA, priority.top=100, ontology="SF", RData.location=RData.location)
SA_sf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
The top 6 significant SCOP domains are shown below:
| name | nAnno | nOverlap | zscore | pvalue | adjp | members |
|---|---|---|---|---|---|---|
| Leukotriene A4 hydrolase N-terminal domain | 12 | 3 | 12.50 | 2.0e-07 | 1.8e-06 | LNPEP, ERAP2, ERAP1 |
| Metalloproteases (zincins), catalytic domain | 100 | 3 | 3.75 | 1.2e-03 | 4.0e-03 | LNPEP, ERAP2, ERAP1 |
| SH2 domain | 112 | 3 | 3.46 | 1.8e-03 | 4.0e-03 | TYK2, STAT2, FYN |
| Immunoglobulin | 481 | 7 | 3.26 | 1.6e-03 | 4.0e-03 | HLA-DQA1, HLA-E, ICAM1, ICAM3, ICAM4, ICAM5, IL12B |
| Fibronectin type III | 200 | 3 | 2.17 | 1.4e-02 | 2.5e-02 | IL12B, IL12RB2, IL23R |
| WD40 repeat-like | 281 | 3 | 1.51 | 4.1e-02 | 5.8e-02 | KIF21B, PAN2, WDR83 |
eTerm <- xPrioritiserPathways(pNode=pSA, priority.top=100, ontology="GOMF", RData.location=RData.location)
SA_sf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
The top 10 significant GO molecular function terms are shown below:
| name | nAnno | nOverlap | zscore | pvalue | adjp | members |
|---|---|---|---|---|---|---|
| aminopeptidase activity | 28 | 3 | 7.67 | 1.1e-05 | 0.00019 | ERAP2, LNPEP, ERAP1 |
| integrin binding | 105 | 4 | 4.83 | 1.8e-04 | 0.00160 | ICAM1, ICAM3, ICAM4, ICAM5 |
| ATP-dependent RNA helicase activity | 63 | 3 | 4.81 | 2.7e-04 | 0.00160 | DHX30, SKIV2L, DDX39B |
| RNA binding | 404 | 5 | 2.13 | 1.5e-02 | 0.07000 | RPL34, SETD1A, ILF3, SKIV2L, PUS10 |
| receptor binding | 321 | 4 | 1.92 | 2.2e-02 | 0.08000 | HLA-E, ICAM3, NOS2, APOF |
| signal transducer activity | 227 | 3 | 1.77 | 2.7e-02 | 0.08100 | STAT2, SLC44A2, KCNH7 |
| poly(A) RNA binding | 1118 | 9 | 1.51 | 5.1e-02 | 0.13000 | MRPL4, RAVER1, DHX30, SSRP1, XRCC6, ILF3, CAST, DDX39B, CS |
| nucleotide binding | 304 | 3 | 1.22 | 6.6e-02 | 0.15000 | RAVER1, REV3L, SETD1A |
| protein serine/threonine kinase activity | 325 | 3 | 1.09 | 8.0e-02 | 0.16000 | SQSTM1, CSNK2A1, BCKDK |
| ATP binding | 1454 | 10 | 1.07 | 1.1e-01 | 0.19000 | FYN, CSNK2A1, DHX30, XRCC6, SKIV2L, KIF21B, BCKDK, VARS2, DDX39B, TYK2 |
# find maximum-scoring gene network with the desired node number=50
SA_net <- xPrioritiserSubnet(pNode=pSA, priority.quantite=0.1, subnet.size=50, RData.location=RData.location)
The identified gene network has nodes/genes colored according to their priority scores (see below). Notably, if nodes appear abnormally, please remove vertex.shape="sphere" when running the function xVisNet.
g <- SA_net
pattern <- as.numeric(V(g)$priority)
zmax <- ceiling(quantile(pattern,0.75)*1000)/1000
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="yr", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))
Circos plot of the gene network is shown below:
# remove chromosomes without any genes
xCircos(g, entity="Gene", top_num=ecount(g), entity.label.cex=0.5, chr.exclude="auto", RData.location=RData.location)
Look at enrichments for genes in the network:
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="MsigdbC2CPall", RData.location=RData.location)
g_path <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
Enrichment results for the top 10 significant pathways are shown below:
| name | nAnno | nOverlap | zscore | pvalue | adjp | members |
|---|---|---|---|---|---|---|
| Beta2 integrin cell surface interactions | 29 | 7 | 17.10 | 2.0e-12 | 1.9e-10 | ICAM1, ITGAL, ITGAX, ITGAM, ICAM3, FCGR2A, ICAM4 |
| Integrin cell surface interactions | 79 | 8 | 11.50 | 3.8e-10 | 1.4e-08 | ITGB1, ICAM1, ITGAL, ITGAX, ITGB3, ITGAM, ICAM3, ICAM4 |
| Jak-STAT signaling pathway | 155 | 10 | 9.93 | 6.0e-10 | 1.4e-08 | IL4, STAT2, IL12B, IL13, IL6R, TYK2, IL12RB2, CTF1, IL23A, IL23R |
| Immune System | 912 | 21 | 7.46 | 5.9e-10 | 1.4e-08 | NFKBIA, ITGB1, RPS6KA1, STAT2, FYN, ICAM1, ITGAL, TNFAIP3, IL6R, TYK2, PSMA6, ICAM3, DDX58, RNF41, LNPEP, UBA52, REL, SQSTM1, ICAM4, ERAP1, NPEPPS |
| IL23-mediated signaling events | 37 | 6 | 12.80 | 9.9e-10 | 1.9e-08 | NFKBIA, IL12B, TYK2, NOS2, IL23A, IL23R |
| Monocyte and its Surface Molecules | 11 | 4 | 15.90 | 2.0e-09 | 3.1e-08 | ITGB1, ICAM1, ITGAL, ITGAM |
| IL27-mediated signaling events | 26 | 5 | 12.80 | 4.7e-09 | 5.0e-08 | STAT2, IL12B, TYK2, IL12RB2, TBX21 |
| Integrin family cell surface interactions | 26 | 5 | 12.80 | 4.7e-09 | 5.0e-08 | ITGB1, ITGAL, ITGAX, ITGB3, ITGAM |
| Leishmania infection | 72 | 7 | 10.50 | 4.8e-09 | 5.0e-08 | NFKBIA, IL4, ITGB1, IL12B, ITGAM, NOS2, FCGR2A |
| NO2-dependent IL 12 Pathway in NK cells | 17 | 4 | 12.70 | 2.6e-08 | 2.4e-07 | IL12B, TYK2, IL12RB2, NOS2 |
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="DGIdb", RData.location=RData.location)
g_dbi <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
Enrichment results for the top 5 significant druggable categories are shown below:
| name | nAnno | nOverlap | zscore | pvalue | adjp | members |
|---|---|---|---|---|---|---|
| External side of plasma membrane | 189 | 7 | 6.75 | 1.2e-06 | 1.2e-05 | SCNN1A, ITGAL, ITGAX, ITGB1, ICAM1, IL4, IL13 |
| Cell surface | 467 | 9 | 4.93 | 2.1e-05 | 1.1e-04 | TNFRSF1A, SCNN1A, STX4, ITGAL, ITGAX, ITGB1, ICAM1, IL4, IL13 |
| Tumor suppressor | 716 | 9 | 3.40 | 7.7e-04 | 2.6e-03 | ITGB1, TNFAIP3, UBA52, IL12B, CSNK2A1, PSMA6, ETS1, CDC37, RUNX3 |
| Protease inhibitor | 179 | 3 | 2.47 | 8.1e-03 | 1.7e-02 | TNFAIP3, RPS6KA1, CAST |
| Neutral zinc metallopeptidase | 182 | 3 | 2.44 | 8.6e-03 | 1.7e-02 | LNPEP, NPEPPS, ERAP1 |
Import SLE-associated GWAS lead SNPs
data.file <- "http://galahad.well.ox.ac.uk/bigdata/SLE.txt"
SLE <- read.delim(data.file, header=TRUE, stringsAsFactors=FALSE)
pSLE <- xPrioritiserSNPs(data=SLE, include.LD="EUR", include.eQTL=c("JKscience_TS2B","JKscience_TS3A"), network="STRING_high", significance.threshold=5e-5, distance.max=200000, decay.kernel="rapid", restart=0.75, RData.location=RData.location)
The results are stored in the data frame pSLE$priority, which can be saved into a file SLE.genes_priority.txt:
SLE_genes <- pSLE$priority
write.table(SLE_genes, file="SLE.genes_priority.txt", sep="\t", row.names=F)
The top 20 genes are listed below:
| name | seed | weight | priority | rank |
|---|---|---|---|---|
| ATF6B | 1 | 149.00804 | 0.04358 | 1 |
| NELFE | 1 | 79.37208 | 0.02335 | 2 |
| VARS2 | 1 | 69.41208 | 0.02029 | 3 |
| ITGAM | 1 | 63.64914 | 0.01884 | 4 |
| PBX2 | 1 | 60.75875 | 0.01783 | 5 |
| PYCARD | 1 | 59.68441 | 0.01753 | 6 |
| CUEDC2 | 1 | 59.78891 | 0.01746 | 7 |
| STAT4 | 1 | 57.81733 | 0.01707 | 8 |
| HLA-DPB1 | 1 | 52.92738 | 0.01576 | 9 |
| IRF5 | 1 | 52.36363 | 0.01540 | 10 |
| TNPO3 | 1 | 52.36363 | 0.01533 | 11 |
| CLIC1 | 1 | 52.07534 | 0.01527 | 12 |
| PSMB9 | 1 | 50.63699 | 0.01482 | 13 |
| RNF114 | 1 | 48.59792 | 0.01427 | 14 |
| C2 | 1 | 46.16449 | 0.01354 | 15 |
| LTA | 1 | 42.43087 | 0.01303 | 16 |
| SKIV2L | 1 | 42.63385 | 0.01275 | 17 |
| NCF2 | 1 | 41.58059 | 0.01219 | 18 |
| HNRNPM | 1 | 41.47880 | 0.01217 | 19 |
| SMG7 | 1 | 38.82899 | 0.01137 | 20 |
These top 20 genes are also highlighted in manhattan plot, in which priority scores for genes are displayed on the Y-axis along with genomic locations on the X-axis.
mp_SLE <- xPrioritiserManhattan(pSLE, highlight.top=20, cex=0.4, highlight.col="red", highlight.label.size=3, RData.location=RData.location)
print(mp_SLE)
Pathway-level prioritisation is based on top 100 genes using a compendium of pathways from diverse sources (Canonical, KEGG, BioCarta and Reactome).
eTerm <- xPrioritiserPathways(pNode=pSLE, priority.top=100, ontology="MsigdbC2CPall", RData.location=RData.location)
# view the top pathways/terms
xEnrichViewer(eTerm)
# save results to a file `SLE.pathways_priority.txt`
SLE_pathways <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
output <- data.frame(term=rownames(SLE_pathways), SLE_pathways)
write.table(output, file="SLE.pathways_priority.txt", sep="\t", row.names=F)
The top 10 prioritised pathways are shown below:
| name | nAnno | nOverlap | zscore | pvalue | adjp | members |
|---|---|---|---|---|---|---|
| Beta2 integrin cell surface interactions | 29 | 5 | 10.50 | 5.0e-08 | 1.4e-06 | ITGAX, ITGAM, ICAM3, FCGR2A, ITGAD |
| Leishmania infection | 72 | 7 | 9.00 | 4.3e-08 | 1.4e-06 | STAT1, NCF2, ITGAM, HLA-DOB, HLA-DPB1, HLA-DQA1, FCGR2A |
| Immune System | 912 | 21 | 5.90 | 2.1e-07 | 3.9e-06 | CD80, STAT1, TNFAIP3, C2, C4A, CFB, NCF2, TYK2, SKP1, UBE2L3, HLA-DOB, HLA-DPB1, HLA-DQA1, ICAM3, PYCARD, PSMB9, IFIH1, IRF5, BLK, IRF8, SQSTM1 |
| Type I diabetes mellitus | 43 | 5 | 8.42 | 5.9e-07 | 8.4e-06 | CD80, LTA, HLA-DOB, HLA-DPB1, HLA-DQA1 |
| Initial triggering of complement | 13 | 3 | 9.47 | 1.8e-06 | 1.7e-05 | C2, C4A, CFB |
| Regulation of Complement cascade | 13 | 3 | 9.47 | 1.8e-06 | 1.7e-05 | C2, C4A, CFB |
| IL22 Soluble Receptor Signaling Pathway | 16 | 3 | 8.48 | 4.4e-06 | 3.6e-05 | STAT1, STAT4, TYK2 |
| Interferon gamma signaling | 62 | 5 | 6.81 | 5.3e-06 | 3.8e-05 | STAT1, HLA-DPB1, HLA-DQA1, IRF5, IRF8 |
| Allograft rejection | 37 | 4 | 7.22 | 6.4e-06 | 3.9e-05 | CD80, HLA-DOB, HLA-DPB1, HLA-DQA1 |
| Systemic lupus erythematosus | 139 | 7 | 6.01 | 6.8e-06 | 3.9e-05 | CD80, C2, C4A, HLA-DOB, HLA-DPB1, HLA-DQA1, FCGR2A |
This barplot displays the top 10 prioritised pathways/terms, where a vertical line indicates the adjusted p-value cutoff at 0.05.
df <- SLE_pathways[1:10,]
df$name <- factor(df$name, levels=df$name)
p <- ggplot(df, aes(x=name, y=-1*log10(adjp)))
p + geom_bar(stat="identity", fill="deepskyblue") + ylab("Priority scores: -log10(adjp)") + theme(axis.title.y=element_blank(), axis.text.y=element_text(size=12,color="blue"), axis.title.x=element_text(size=14,color="blue")) + geom_hline(yintercept=-log10(0.05),color="blue") + coord_flip()
Similarly, look at enrichments for top genes from different aspects:
eTerm <- xPrioritiserPathways(pNode=pSLE, priority.top=100, ontology="SF", RData.location=RData.location)
SLE_sf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
The top 8 significant SCOP domains are shown below:
| name | nAnno | nOverlap | zscore | pvalue | adjp | members |
|---|---|---|---|---|---|---|
| vWA-like | 95 | 6 | 7.33 | 1.3e-06 | 2.0e-05 | CFB, C2, ITGAD, ITGAM, ITGAX, XRCC6 |
| Integrin alpha N-terminal domain | 23 | 3 | 7.82 | 8.9e-06 | 6.3e-05 | ITGAD, ITGAM, ITGAX |
| Integrin domains | 25 | 3 | 7.47 | 1.3e-05 | 6.3e-05 | ITGAD, ITGAM, ITGAX |
| MHC antigen-recognition domain | 40 | 3 | 5.73 | 8.6e-05 | 3.2e-04 | HLA-DOB, HLA-DPB1, HLA-DQA1 |
| TNF-like | 53 | 3 | 4.84 | 2.6e-04 | 7.8e-04 | LTB, TNFSF4, LTA |
| SH2 domain | 112 | 4 | 4.15 | 5.0e-04 | 1.3e-03 | TYK2, STAT1, STAT4, BLK |
| RING/U-box | 334 | 5 | 2.20 | 1.4e-02 | 2.9e-02 | SQSTM1, TRIM72, PPIL2, RNF114, PHRF1 |
| Immunoglobulin | 481 | 6 | 1.93 | 2.3e-02 | 3.8e-02 | HLA-DOB, HLA-DPB1, HLA-DQA1, ICAM3, CD80, FCGR2A |
eTerm <- xPrioritiserPathways(pNode=pSLE, priority.top=100, ontology="GOMF", RData.location=RData.location)
SLE_sf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
The top 10 significant GO molecular function terms are shown below:
| name | nAnno | nOverlap | zscore | pvalue | adjp | members |
|---|---|---|---|---|---|---|
| tumor necrosis factor receptor binding | 30 | 4 | 9.30 | 6.7e-07 | 1.5e-05 | STAT1, LTA, LTB, TNFSF4 |
| ATP-dependent RNA helicase activity | 63 | 3 | 4.43 | 4.5e-04 | 5.2e-03 | DHX30, SKIV2L, DDX39B |
| receptor binding | 321 | 6 | 3.13 | 2.3e-03 | 1.8e-02 | HSPA1A, ICAM3, LTA, BLK, LTB, TNFSF4 |
| actin filament binding | 113 | 3 | 2.96 | 3.9e-03 | 2.3e-02 | AIF1, ARPC5, FLNC |
| sequence-specific DNA binding transcription factor activity | 825 | 10 | 2.53 | 6.9e-03 | 3.2e-02 | IKZF1, PTTG1, MECP2, ETS1, ATF6B, PBX2, PHTF1, STAT1, STAT4, IRF5 |
| cytokine activity | 172 | 3 | 2.06 | 1.7e-02 | 5.9e-02 | LTA, LTB, TNFSF4 |
| transcription factor binding | 266 | 4 | 2.04 | 1.8e-02 | 5.9e-02 | MECP2, ETS1, PBX2, GPX3 |
| ligase activity | 283 | 4 | 1.91 | 2.3e-02 | 6.5e-02 | RNF114, UBE2L3, PPIL2, TNFAIP3 |
| sequence-specific DNA binding | 407 | 5 | 1.79 | 2.8e-02 | 6.6e-02 | IKZF1, ETS1, ATF6B, PBX2, STAT4 |
| ATP binding | 1454 | 13 | 1.74 | 3.4e-02 | 6.6e-02 | UBE2L3, IFIH1, HSPA1A, DHX30, BLK, XRCC6, SKIV2L, RENBP, VARS2, DDX39B, NADSYN1, NMNAT2, TYK2 |
# find maximum-scoring gene network with the desired node number=50
SLE_net <- xPrioritiserSubnet(pNode=pSLE, priority.quantite=0.1, subnet.size=50, RData.location=RData.location)
The identified gene network has nodes/genes colored according to their priority scores (see below). Notably, if nodes appear abnormally, please remove vertex.shape="sphere" when running the function xVisNet.
g <- SLE_net
pattern <- as.numeric(V(g)$priority)
zmax <- ceiling(quantile(pattern,0.75)*1000)/1000
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="yr", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))
Circos plot of the gene network is shown below:
# remove chromosomes without any genes
xCircos(g, entity="Gene", top_num=ecount(g), entity.label.cex=0.5, chr.exclude="auto", RData.location=RData.location)
Look at enrichments for genes in the network:
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="MsigdbC2CPall", RData.location=RData.location)
g_path <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
Enrichment results for the top 10 significant pathways are shown below:
| name | nAnno | nOverlap | zscore | pvalue | adjp | members |
|---|---|---|---|---|---|---|
| Immune System | 912 | 22 | 8.63 | 4.5e-12 | 4.9e-10 | CD80, HRAS, STAT1, FCGR2B, CSK, CDKN1B, NCF2, SOCS1, IRAK1, TYK2, CD44, SKP1, RASGRP3, IRF7, PYCARD, FCGR3A, RASGRP1, UBA52, IRF5, BLK, IRF8, SQSTM1 |
| Leishmania infection | 72 | 8 | 12.80 | 5.7e-11 | 3.1e-09 | IL10, STAT1, TNF, NCF2, ITGAM, IRAK1, FCGR2A, FCGR3A |
| Cytokine Signaling in Immune system | 268 | 12 | 9.36 | 3.4e-10 | 1.2e-08 | HRAS, STAT1, SOCS1, IRAK1, TYK2, CD44, SKP1, IRF7, UBA52, IRF5, IRF8, SQSTM1 |
| IL-10 Anti-inflammatory Signaling Pathway | 17 | 4 | 13.50 | 1.5e-08 | 4.1e-07 | IL10, STAT1, STAT4, TNF |
| Interferon gamma signaling | 62 | 6 | 10.30 | 1.9e-08 | 4.3e-07 | STAT1, SOCS1, CD44, IRF7, IRF5, IRF8 |
| Interferon alpha/beta signaling | 64 | 6 | 10.10 | 2.4e-08 | 4.5e-07 | STAT1, SOCS1, TYK2, IRF7, IRF5, IRF8 |
| Interferon Signaling | 158 | 8 | 8.19 | 6.6e-08 | 1.0e-06 | STAT1, SOCS1, TYK2, CD44, IRF7, UBA52, IRF5, IRF8 |
| IL27-mediated signaling events | 26 | 4 | 10.80 | 1.5e-07 | 1.7e-06 | STAT1, STAT4, TNF, TYK2 |
| Signaling by the B Cell Receptor (BCR) | 123 | 7 | 8.20 | 1.4e-07 | 1.7e-06 | HRAS, CDKN1B, SKP1, RASGRP3, RASGRP1, UBA52, BLK |
| Adaptive Immune System | 524 | 13 | 6.61 | 1.4e-07 | 1.7e-06 | CD80, HRAS, FCGR2B, CSK, CDKN1B, NCF2, SOCS1, SKP1, RASGRP3, FCGR3A, RASGRP1, UBA52, BLK |
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="DGIdb", RData.location=RData.location)
g_dbi <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
Enrichment results for the top 5 significant druggable categories are shown below:
| name | nAnno | nOverlap | zscore | pvalue | adjp | members |
|---|---|---|---|---|---|---|
| Clinically actionable | 237 | 6 | 4.30 | 0.00020 | 0.0017 | CDKN1B, SOCS1, TP53, HRAS, PRDM1, STAT4 |
| Histone modification | 249 | 6 | 4.14 | 0.00028 | 0.0017 | TP53, CDK9, HCFC1, KAT8, MECP2, SKP1 |
| External side of plasma membrane | 189 | 5 | 4.04 | 0.00042 | 0.0017 | TNF, ITGAX, CD80, FCGR3A, CD44 |
| Drug resistance | 349 | 6 | 3.13 | 0.00210 | 0.0062 | TNF, SOCS1, TP53, IL10, GPX3, NCF2 |
| Tumor suppressor | 716 | 9 | 2.82 | 0.00320 | 0.0076 | TNF, CDKN1B, TP53, HRAS, CDK9, UBA52, HCFC1, ETS1, CDC37 |
Import DD-associated GWAS lead SNPs
data.file <- "http://galahad.well.ox.ac.uk/bigdata/Dupuytren.txt"
DD <- read.delim(data.file, header=TRUE, stringsAsFactors=FALSE)
The 13 DD-associated SNPs are:
| SNPs | Pvalue |
|---|---|
| rs16879765 | 6e-39 |
| rs6519955 | 3e-33 |
| rs611744 | 8e-15 |
| rs11672517 | 7e-14 |
| rs2912522 | 2e-13 |
| rs8124695 | 8e-10 |
| rs7524102 | 3e-09 |
| rs10809650 | 6e-09 |
| rs4730775 | 3e-08 |
| rs2179367 | 3e-07 |
| rs4789939 | 6e-07 |
| rs4932194 | 8e-07 |
| rs12032381 | 6e-06 |
pDD <- xPrioritiserSNPs(data=DD, include.LD="EUR", include.eQTL=c("JKscience_TS2B","JKscience_TS3A"), network="STRING_high", significance.threshold=5e-5, distance.max=200000, decay.kernel="rapid", restart=0.75, RData.location=RData.location)
The results are stored in the data frame pDD$priority, which can be saved into a file DD.genes_priority.txt:
DD_genes <- pDD$priority
write.table(DD_genes, file="DD.genes_priority.txt", sep="\t", row.names=F)
The top 20 genes are listed below:
| name | seed | weight | priority | rank |
|---|---|---|---|---|
| PRR34 | 1 | 11.96220 | 0.11660 | 1 |
| SFRP4 | 1 | 12.43433 | 0.11429 | 2 |
| NME8 | 1 | 10.15061 | 0.09302 | 3 |
| WNT7B | 1 | 8.90219 | 0.08214 | 4 |
| EIF3E | 1 | 4.89795 | 0.04480 | 5 |
| DUXA | 1 | 4.42695 | 0.04138 | 6 |
| RSPO2 | 1 | 4.50962 | 0.04136 | 7 |
| ZIM3 | 1 | 3.52149 | 0.03572 | 8 |
| USP29 | 1 | 3.01678 | 0.03165 | 9 |
| BPIFC | 0 | 0.00000 | 0.02915 | 10 |
| PPARA | 1 | 2.79592 | 0.02560 | 11 |
| TIMP2 | 1 | 2.23830 | 0.02052 | 12 |
| AURKC | 1 | 2.04431 | 0.01872 | 13 |
| WNT2 | 1 | 1.61094 | 0.01569 | 14 |
| ZC3H12D | 1 | 1.04283 | 0.00953 | 15 |
| NUP43 | 1 | 0.96004 | 0.00878 | 16 |
| ZBTB40 | 1 | 0.91339 | 0.00849 | 17 |
| PCMT1 | 1 | 0.91664 | 0.00838 | 18 |
| ZNF460 | 1 | 0.82780 | 0.00758 | 19 |
| TAB2 | 1 | 0.80465 | 0.00737 | 20 |
These top 20 genes are also highlighted in manhattan plot, in which priority scores for genes are displayed on the Y-axis along with genomic locations on the X-axis.
mp_DD <- xPrioritiserManhattan(pDD, highlight.top=20, cex=0.4, highlight.col="red", highlight.label.size=3, RData.location=RData.location)
print(mp_DD)
Pathway-level prioritisation is based on top 100 genes using a compendium of pathways from diverse sources (Canonical, KEGG, BioCarta and Reactome).
eTerm <- xPrioritiserPathways(pNode=pDD, priority.top=100, ontology="MsigdbC2CPall", RData.location=RData.location)
# view the top pathways/terms
xEnrichViewer(eTerm)
# save results to a file `DD.pathways_priority.txt`
DD_pathways <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
output <- data.frame(term=rownames(DD_pathways), DD_pathways)
write.table(output, file="DD.pathways_priority.txt", sep="\t", row.names=F)
The top 10 prioritised pathways are shown below:
| name | nAnno | nOverlap | zscore | pvalue | adjp | members |
|---|---|---|---|---|---|---|
| Basal cell carcinoma | 55 | 29 | 47.4 | 0.0e+00 | 0.0e+00 | WNT1, FZD1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8A, WNT8B, WNT10B, WNT11, WNT2B, FZD4, WNT9A, FZD6, WNT9B, FZD7, FZD2, FZD8, WNT10A, FZD9, WNT3A, FZD10, WNT16, FZD5, FZD3, WNT5B |
| Melanogenesis | 101 | 29 | 34.7 | 0.0e+00 | 0.0e+00 | WNT1, FZD1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8A, WNT8B, WNT10B, WNT11, WNT2B, FZD4, WNT9A, FZD6, WNT9B, FZD7, FZD2, FZD8, WNT10A, FZD9, WNT3A, FZD10, WNT16, FZD5, FZD3, WNT5B |
| Wnt signaling pathway | 151 | 32 | 31.1 | 0.0e+00 | 0.0e+00 | WNT1, FZD1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8A, WNT8B, WNT10B, WNT11, WNT2B, FZD4, WNT9A, FZD6, WNT9B, FZD7, FZD2, FZD8, WNT10A, FZD9, WNT3A, FZD10, WNT16, FZD5, FZD3, WNT5B, SFRP4, DKK2, DKK1 |
| Class B/2 (Secretin family receptors) | 87 | 27 | 34.8 | 0.0e+00 | 0.0e+00 | WNT1, FZD1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8B, WNT10B, WNT11, WNT2B, FZD4, WNT9A, FZD6, WNT9B, FZD7, FZD2, FZD8, WNT10A, FZD9, WNT3A, FZD10, WNT16, FZD5, FZD3 |
| Genes related to Wnt-mediated signal transduction | 89 | 26 | 33.1 | 0.0e+00 | 0.0e+00 | WNT1, FZD1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8A, WNT11, WNT2B, FZD4, WNT9A, FZD6, FZD7, FZD2, FZD8, WNT10A, WNT3A, WNT16, FZD5, FZD3, WNT5B, SFRP4, DKK1 |
| Wnt signaling network | 28 | 17 | 38.9 | 0.0e+00 | 0.0e+00 | WNT1, FZD1, WNT2, WNT3, WNT5A, WNT7A, WNT7B, FZD4, FZD6, FZD7, FZD2, FZD8, FZD9, WNT3A, FZD10, FZD5, DKK1 |
| Hedgehog signaling pathway | 56 | 19 | 30.5 | 0.0e+00 | 0.0e+00 | WNT1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8A, WNT8B, WNT10B, WNT11, WNT2B, WNT9A, WNT9B, WNT10A, WNT3A, WNT16, WNT5B |
| Pathways in cancer | 325 | 29 | 18.5 | 0.0e+00 | 0.0e+00 | WNT1, FZD1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8A, WNT8B, WNT10B, WNT11, WNT2B, FZD4, WNT9A, FZD6, WNT9B, FZD7, FZD2, FZD8, WNT10A, FZD9, WNT3A, FZD10, WNT16, FZD5, FZD3, WNT5B |
| GPCR ligand binding | 400 | 27 | 15.2 | 0.0e+00 | 0.0e+00 | WNT1, FZD1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8B, WNT10B, WNT11, WNT2B, FZD4, WNT9A, FZD6, WNT9B, FZD7, FZD2, FZD8, WNT10A, FZD9, WNT3A, FZD10, WNT16, FZD5, FZD3 |
| Genes encoding secreted soluble factors | 344 | 21 | 12.6 | 1.4e-16 | 3.2e-16 | WNT1, WNT4, WNT2, WNT3, WNT5A, WNT6, WNT7A, WNT7B, WNT8A, WNT8B, WNT10B, WNT11, WNT2B, WNT9A, WNT9B, WNT10A, WNT3A, WNT16, WNT5B, SFRP4, MEGF6 |
This barplot displays the top 10 prioritised pathways/terms, where a vertical line indicates the adjusted p-value cutoff at 0.05.
df <- DD_pathways[1:10,]
df$name <- factor(df$name, levels=df$name)
p <- ggplot(df, aes(x=name, y=-1*log10(adjp)))
p + geom_bar(stat="identity", fill="deepskyblue") + ylab("Priority scores: -log10(adjp)") + theme(axis.title.y=element_blank(), axis.text.y=element_text(size=12,color="blue"), axis.title.x=element_text(size=14,color="blue")) + geom_hline(yintercept=-log10(0.05),color="blue") + coord_flip()
Similarly, look at enrichments for top genes from different aspects:
eTerm <- xPrioritiserPathways(pNode=pDD, priority.top=100, ontology="SF", RData.location=RData.location)
DD_sf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
The top 6 significant SCOP domains are shown below:
| name | nAnno | nOverlap | zscore | pvalue | adjp | members |
|---|---|---|---|---|---|---|
| Frizzled cysteine-rich domain | 20 | 11 | 34.00 | 0.0e+00 | 0.0e+00 | SFRP4, FZD1, FZD4, FZD6, FZD7, FZD8, FZD10, FZD9, FZD3, FZD5, FZD2 |
| Thioredoxin-like | 127 | 12 | 14.10 | 6.8e-14 | 3.1e-13 | TXNL1, TXNRD1, TXNRD3, PRDX2, PRDX1, NME8, PRDX3, TXNDC2, PRDX5, TXNDC8, TXN2, PRDX4 |
| FAD/NAD(P)-binding domain | 54 | 3 | 5.17 | 1.7e-04 | 5.1e-04 | TXNRD1, TXNRD3, TXNRD2 |
| Growth factor receptor domain | 127 | 4 | 4.15 | 5.0e-04 | 1.1e-03 | RSPO4, MEGF6, RSPO3, RSPO2 |
| Cysteine proteinases | 149 | 3 | 2.56 | 7.4e-03 | 1.3e-02 | USP5, USP29, USP36 |
| Homeodomain-like | 294 | 4 | 2.04 | 1.8e-02 | 2.7e-02 | DUXA, ARGFX, TPRX1, LEUTX |
eTerm <- xPrioritiserPathways(pNode=pDD, priority.top=100, ontology="GOMF", RData.location=RData.location)
DD_sf <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
The top 10 significant GO molecular function terms are shown below:
| name | nAnno | nOverlap | zscore | pvalue | adjp | members |
|---|---|---|---|---|---|---|
| frizzled binding | 37 | 23 | 48.10 | 0.0e+00 | 0.0e+00 | WNT5A, ZNRF3, WNT3A, WNT1, WNT3, RSPO3, WNT7A, WNT5B, FZD1, FZD7, WNT4, WNT11, WNT2, WNT8A, WNT10B, WNT6, WNT7B, WNT8B, WNT10A, WNT2B, WNT9A, WNT9B, WNT16 |
| Wnt-activated receptor activity | 21 | 11 | 30.50 | 0.0e+00 | 0.0e+00 | FZD4, FZD1, FZD8, SFRP4, FZD5, FZD2, FZD6, FZD7, FZD9, FZD3, FZD10 |
| Wnt-protein binding | 28 | 11 | 26.30 | 4.0e-20 | 3.3e-19 | FZD4, FZD1, FZD8, SFRP4, FZD5, FZD2, FZD6, FZD7, FZD9, FZD3, FZD10 |
| receptor agonist activity | 15 | 9 | 29.50 | 1.3e-19 | 8.8e-19 | WNT5A, WNT3A, WNT1, WNT3, WNT7A, WNT4, WNT2, WNT8A, WNT10B |
| protein disulfide oxidoreductase activity | 23 | 6 | 15.70 | 5.6e-11 | 3.2e-10 | TXNRD1, TXNRD3, TXN2, TXNDC2, TXNDC8, TXNL1 |
| PDZ domain binding | 86 | 6 | 7.60 | 9.1e-07 | 4.2e-06 | FZD4, FZD1, FZD8, FZD2, FZD7, FZD3 |
| G-protein coupled receptor activity | 643 | 12 | 4.18 | 1.4e-04 | 5.4e-04 | FZD4, FZD1, FZD8, SFRP4, FZD5, FZD2, FZD6, FZD7, FZD9, FZD3, FZD10, LGR6 |
| G-protein coupled receptor binding | 45 | 3 | 5.22 | 1.6e-04 | 5.6e-04 | RSPO3, RSPO2, RSPO4 |
| flavin adenine dinucleotide binding | 62 | 3 | 4.28 | 5.5e-04 | 1.7e-03 | TXNRD1, TXNRD3, TXNRD2 |
| microtubule motor activity | 68 | 3 | 4.03 | 7.9e-04 | 2.0e-03 | DNAH11, DNAH5, DNAI2 |
# find maximum-scoring gene network with the desired node number=20
DD_net <- xPrioritiserSubnet(pNode=pDD, priority.quantite=0.1, subnet.size=21, RData.location=RData.location)
The identified gene network has nodes/genes colored according to their priority scores (see below). Notably, if nodes appear abnormally, please remove vertex.shape="sphere" when running the function xVisNet.
g <- DD_net
pattern <- as.numeric(V(g)$priority)
zmax <- ceiling(quantile(pattern,0.75)*1000)/1000
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="yr", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))
Circos plot of the gene network is shown below:
# remove chromosomes without any genes
xCircos(g, entity="Gene", top_num=ecount(g), entity.label.cex=0.5, chr.exclude="auto", RData.location=RData.location)
Look at enrichments for genes in the network:
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="MsigdbC2CPall", RData.location=RData.location)
g_path <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
Enrichment results for the top 10 significant pathways are shown below:
| name | nAnno | nOverlap | zscore | pvalue | adjp | members |
|---|---|---|---|---|---|---|
| Basal cell carcinoma | 55 | 9 | 25.80 | 3.5e-18 | 4.9e-17 | WNT2, WNT6, WNT7B, WNT11, FZD4, FZD7, FZD8, WNT10A, FZD10 |
| Class B/2 (Secretin family receptors) | 87 | 9 | 20.40 | 4.7e-16 | 2.8e-15 | WNT2, WNT6, WNT7B, WNT11, FZD4, FZD7, FZD8, WNT10A, FZD10 |
| Genes related to Wnt-mediated signal transduction | 89 | 9 | 20.20 | 5.9e-16 | 2.8e-15 | WNT2, WNT6, WNT7B, WNT11, FZD4, FZD7, FZD8, WNT10A, SFRP4 |
| Melanogenesis | 101 | 9 | 18.90 | 2.2e-15 | 6.2e-15 | WNT2, WNT6, WNT7B, WNT11, FZD4, FZD7, FZD8, WNT10A, FZD10 |
| Wnt signaling pathway | 151 | 10 | 17.10 | 1.8e-15 | 6.2e-15 | WNT2, WNT6, WNT7B, WNT11, FZD4, FZD7, FZD8, WNT10A, FZD10, SFRP4 |
| Wnt signaling network | 28 | 6 | 24.20 | 7.4e-14 | 1.7e-13 | WNT2, WNT7B, FZD4, FZD7, FZD8, FZD10 |
| Pathways in cancer | 325 | 11 | 12.50 | 2.2e-13 | 4.4e-13 | EGFR, WNT2, WNT6, WNT7B, WNT11, FZD4, FZD7, FZD8, WNT10A, FZD10, MMP9 |
| Hedgehog signaling pathway | 56 | 5 | 14.10 | 1.3e-09 | 2.3e-09 | WNT2, WNT6, WNT7B, WNT11, WNT10A |
| GPCR ligand binding | 400 | 9 | 8.95 | 2.2e-09 | 3.5e-09 | WNT2, WNT6, WNT7B, WNT11, FZD4, FZD7, FZD8, WNT10A, FZD10 |
| Signaling by GPCR | 906 | 10 | 6.06 | 4.7e-07 | 6.5e-07 | EGFR, WNT2, WNT6, WNT7B, WNT11, FZD4, FZD7, FZD8, WNT10A, FZD10 |
data <- V(g)$name
eTerm <- xEnricherGenes(data, ontology="DGIdb", RData.location=RData.location)
g_dbi <- xEnrichViewer(eTerm, top_num=length(eTerm$adjp), sortBy="adjp", details=T)
Enrichment results for the top 3 significant druggable categories are shown below:
| name | nAnno | nOverlap | zscore | pvalue | adjp | members |
|---|---|---|---|---|---|---|
| Cell surface | 467 | 6 | 5.0 | 2.9e-05 | 5.8e-05 | SFRP4, FZD10, WNT6, FZD4, RSPO2, TIMP2 |
| G protein coupled receptor | 855 | 4 | 1.6 | 3.4e-02 | 3.4e-02 | FZD10, FZD4, FZD7, FZD8 |
| NA | NA | NA | NA | NA | NA | NA |
In this section, we first describe annotation predictors using ontologies and target genes nominated by ULTRA-DD TPN members, and then integrate them as an additional support for disease-specific genetic prioritisation by Pi.
Using ontologies to annotate genes is one of the most effective ways to reuse existing knowledge. Also it is a scalable way to capture a particular knowledge sphere in an systematic way. An ontology is like a dictionary, containing vocabularies (called terms) and their relations. These terms and relations are understandable by humans, and readable by computers. According to how to organise relationships between terms, there are types of ontologies:
When integrating ontologies of different knowledge spheres, the use is limited by the heterogeneous nature: different terms/keywords represent the same or similar knowledge. For this reason, they are better used in a broader context to capture a general category of knowledge, eg a group of diseases instead of a specific disease. We consider the context broadly relevant to immune-related diseases. We choose 5 annotation predictors as informative carriers generalising their relatedness to immunity/inflammation diseases. For a given gene, we look at:
Since the above predictors are relatively independent to each other, additive scoring scheme is used to calculate annotation scores per gene.
Import annotation predictors (pre-computed)
data.file <- "http://galahad.well.ox.ac.uk/bigdata/iAnno.txt"
iAnno <- read.delim(data.file, header=TRUE, stringsAsFactors=FALSE)
# gene/target info
iSymbol <- iAnno[,2]
iGenes <- iAnno[,1:3]
# predictor info (including individual scores)
iPS <- iAnno[,22:26]
# additive predictor score
PS <- apply(iPS, 1, sum)
# combine additive predictor score and individual acores
iPS <- cbind(PS=PS, iPS)
The first 5 rows of the data frame iGenes are:
| Target.GeneID | Target.Symbol | Target.Name |
|---|---|---|
| 1 | A1BG | alpha-1-B glycoprotein |
| 2 | A2M | alpha-2-macroglobulin |
| 3 | A2MP1 | alpha-2-macroglobulin pseudogene 1 |
| 9 | NAT1 | N-acetyltransferase 1 (arylamine N-acetyltransferase) |
| 10 | NAT2 | N-acetyltransferase 2 (arylamine N-acetyltransferase) |
Their predictor info is stored in the data frame iPS, with the column PS for additive predictor scores:
| PS | OMIM | Phenotype | Function | Pathway | Druggable |
|---|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 | 1 |
| 0 | 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 1 | 0 | 0 | 0 | 0 |
Distributions of annotation predictor scores
df <- iPS
p <- ggplot(df, aes(PS))
p + geom_bar(fill="deepskyblue") + scale_y_log10() + geom_text(stat="bin",binwidth=1,aes(label=..count..),vjust=0,color="blue") + ylab("Number of genes (predicted)") + xlab("Annotation predictor scores (general indicator of immune-relatedness)") + scale_x_continuous(breaks=0:max(df$PS)) + theme_bw() +
theme(axis.title.y=element_text(size=14,color="blue"), axis.title.x=element_text(size=14,color="blue"))
Pairwise correlations between predictors
df <- df[df$PS>0, -1]
res <- supraHex::sDistance(t(df), metric="pearson")
diag(res) <- 1
cellnote <- signif(1-res, digits=2)
diag(cellnote) <- NA
visHeatmapAdv(1-res, Rowv=F, Colv=F, colormap="bwr", zlim=c(-1,1), key=F, cexRow=1, cexCol=1, cellnote=cellnote, notecex=0.8, notecol="black", lhei=c(1,5), lwid=c(1,5))
# TPN nomination info
iTPN <- iAnno[,9:15]
# flag whether nominated by ULTRA-DD TPN members
iNominated <- iAnno[,9]
# info provided by SGC Oxford
iSGC <- iAnno[,4:8]
# iSGC restricted to nominated targets
df <- iSGC[iNominated>0, ]
df$SGC_PI <- !(df$SGC_PI %in% c("Unassigned","Unassigned - TO",""))
Gene family analysis
Below show results of family analysis (provided by Brian and David).
The top family is
integral membrane proteins (IMP), followed by histone proteins (histone lysine demethylase; KDM) and ubiquitin-related proteins. Other top families include DENN domain proteins.
Allocations to SGC PIs
As seen below, 714 genes with 490 (68%) have been allocation to specific SGC-Oxford PIs.
SGC PI allocations are conditioned on SG/ChemBio Tractability:
Generally speaking, targets with good tractability are more likely to be allocated to SGC PIs.
Distributions of annotation predictor scores for nominated genes
Distributions of annotation predictor scores are conditioned on whether genes are nominated or not:
It can be seen that nominated genes tend to receive a higher annotation scores than non-nominated genes do.
Recall: AS genes prioritised by Pi are stored in the data frame AS_genes, and the data frame iPS for annotation predictor scores for all genes stored in iSymbol or iGenes (see above).
# find genes with annotation predictors
ind <- match(rownames(AS_genes), iSymbol)
genes_ap <- cbind(iPS[ind,], iGenes[ind,], Nominated=iNominated[ind])
genes_ap[is.na(genes_ap)] <- 0 # NA means no annotation predictors
# combine AS_genes iPS iGenes iNominated
AS_combined <- cbind(AS_genes[,c(1,4)], genes_ap)
The top 20 AS-specific genes along with annotation predictors and TPN nominations are listed below:
| name | priority | PS | OMIM | Phenotype | Function | Pathway | Druggable | Nominated |
|---|---|---|---|---|---|---|---|---|
| CAST | 0.08460 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
| ERAP2 | 0.03626 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| ERAP1 | 0.03276 | 2 | 0 | 0 | 1 | 1 | 0 | 0 |
| B3GNT2 | 0.02077 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| IL23R | 0.01823 | 3 | 1 | 1 | 1 | 0 | 0 | 0 |
| LNPEP | 0.01696 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
| DDX39B | 0.01284 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
| IL12RB2 | 0.01197 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| ETS2 | 0.01160 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| PSMG1 | 0.01091 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| TBKBP1 | 0.01071 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| NFKBIL1 | 0.01039 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
| KIF21B | 0.01008 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| GPR65 | 0.00969 | 1 | 0 | 0 | 1 | 0 | 0 | 1 |
| C1orf106 | 0.00854 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| IL6R | 0.00747 | 5 | 1 | 1 | 1 | 1 | 1 | 0 |
| TNFRSF1A | 0.00717 | 4 | 1 | 1 | 1 | 0 | 1 | 1 |
| NPEPPS | 0.00670 | 2 | 0 | 0 | 0 | 1 | 1 | 0 |
| KPNB1 | 0.00666 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
| GALC | 0.00654 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
Comparing genetic priority and annotation prediction
This violin plot displays density (the violin shape) of genes receiving AS-specific genetic priority versus annotation predictor scores (general indicator of immune-relatedness).
hline <- quantile(AS_combined$priority,0.95) # genetic priority at the top 5%
hline1 <- quantile(AS_combined$priority,1) # genetic priority at the top
p <- ggplot(AS_combined, aes(factor(PS), priority))
p + geom_violin(trim=T,adjust=0.5) + scale_y_log10() + ylab("AS-specific genetic priority") + xlab("Annotation predictor scores (general indicator of immune-relatedness)") + theme_bw() +
theme(axis.title.y=element_text(size=14,color="blue"), axis.title.x=element_text(size=14,color="blue")) + geom_hline(yintercept=hline,color="darkgreen",linetype="dotdash") + geom_vline(xintercept=c(2,5)+0.5,color="transparent",linetype="dotdash") + geom_segment(aes(x=2.5,xend=2.5,y=hline,yend=hline1),color="red",linetype="dotdash") + geom_segment(aes(x=4.5,xend=4.5,y=hline,yend=hline1),color="red",linetype="dotdash")
A horizontal line is for genetic priority at the top 5%, and two vertical lines separate well-annotated genes, moderately annotated genes, and genes seldom annotated. Therefore genes can be divided into 4 areas:
easy targets/genes receiving high genetic priority and being well-annotated. They are as expected considering sufficient existing knowledge in support. The followup risk is low but the novelty is compromised.a2maid targets/genes receiving high genetic priority and being moderately annotated. They might represent attractive targets as both the risk and novelty are at the twilight zone. The origin of the wording a2maid is explained here.1novel targets/genes receiving high genetic priority but with seldom annotations. Its great novelty comes the high risk to follow-up.The 43 easy targets/genes are:
| Target.Symbol | Target.Name | priority | PS | Nominated |
|---|---|---|---|---|
| IL6R | interleukin 6 receptor | 0.00747 | 5 | 0 |
| TNFRSF1A | tumor necrosis factor receptor superfamily, member 1A | 0.00717 | 4 | 1 |
| CARD9 | caspase recruitment domain family, member 9 | 0.00608 | 4 | 0 |
| TYK2 | tyrosine kinase 2 | 0.00396 | 4 | 0 |
| IFIH1 | interferon induced with helicase C domain 1 | 0.00390 | 4 | 0 |
| CD27 | CD27 molecule | 0.00331 | 4 | 0 |
| FCGR3A | Fc fragment of IgG, low affinity IIIa, receptor (CD16a) | 0.00277 | 4 | 0 |
| IL12B | interleukin 12B | 0.00255 | 5 | 0 |
| NFKB1 | nuclear factor of kappa light polypeptide gene enhancer in B-cells 1 | 0.00252 | 4 | 0 |
| IL2RA | interleukin 2 receptor, alpha | 0.00250 | 5 | 1 |
| ICAM1 | intercellular adhesion molecule 1 | 0.00183 | 4 | 0 |
| NCF4 | neutrophil cytosolic factor 4, 40kDa | 0.00166 | 4 | 0 |
| AIRE | autoimmune regulator | 0.00096 | 4 | 0 |
| ADAR | adenosine deaminase, RNA-specific | 0.00090 | 4 | 1 |
| HLA-B | major histocompatibility complex, class I, B | 0.00075 | 4 | 0 |
| HLA-A | major histocompatibility complex, class I, A | 0.00067 | 5 | 0 |
| NOTCH1 | notch 1 | 0.00064 | 4 | 0 |
| B2M | beta-2-microglobulin | 0.00061 | 5 | 0 |
| IFNG | interferon, gamma | 0.00023 | 4 | 0 |
| STAT4 | signal transducer and activator of transcription 4 | 0.00021 | 4 | 0 |
| IL12A | interleukin 12A | 0.00021 | 4 | 0 |
| JAK2 | Janus kinase 2 | 0.00019 | 5 | 0 |
| IL1B | interleukin 1, beta | 0.00015 | 4 | 0 |
| IL2 | interleukin 2 | 0.00014 | 4 | 0 |
| IL10 | interleukin 10 | 0.00014 | 5 | 0 |
| NFKB2 | nuclear factor of kappa light polypeptide gene enhancer in B-cells 2 (p49/p100) | 0.00014 | 5 | 0 |
| JAK3 | Janus kinase 3 | 0.00014 | 4 | 0 |
| MALT1 | MALT1 paracaspase | 0.00013 | 4 | 0 |
| STAT3 | signal transducer and activator of transcription 3 (acute-phase response factor) | 0.00013 | 4 | 0 |
| IL1RN | interleukin 1 receptor antagonist | 0.00012 | 5 | 0 |
| ITGB2 | integrin, beta 2 (complement component 3 receptor 3 and 4 subunit) | 0.00011 | 5 | 0 |
| PIK3R1 | phosphoinositide-3-kinase, regulatory subunit 1 (alpha) | 0.00011 | 5 | 0 |
| CBL | Cbl proto-oncogene, E3 ubiquitin protein ligase | 0.00011 | 4 | 0 |
| PIK3CA | phosphatidylinositol-4,5-bisphosphate 3-kinase, catalytic subunit alpha | 0.00011 | 4 | 1 |
| IL6 | interleukin 6 | 0.00011 | 5 | 0 |
| NFKBIA | nuclear factor of kappa light polypeptide gene enhancer in B-cells inhibitor, alpha | 0.00011 | 5 | 0 |
| CD3E | CD3e molecule, epsilon (CD3-TCR complex) | 0.00010 | 5 | 0 |
| BCL10 | B-cell CLL/lymphoma 10 | 0.00009 | 4 | 0 |
| IL7R | interleukin 7 receptor | 0.00008 | 5 | 0 |
| AKT1 | v-akt murine thymoma viral oncogene homolog 1 | 0.00008 | 5 | 0 |
| FASLG | Fas ligand (TNF superfamily, member 6) | 0.00008 | 4 | 0 |
| FOS | FBJ murine osteosarcoma viral oncogene homolog | 0.00008 | 4 | 0 |
| RAF1 | Raf-1 proto-oncogene, serine/threonine kinase | 0.00008 | 4 | 0 |
The top 10 a2maid targets/genes are:
| Target.Symbol | Target.Name | priority | PS | Nominated |
|---|---|---|---|---|
| ERAP1 | endoplasmic reticulum aminopeptidase 1 | 0.03276 | 2 | 0 |
| IL23R | interleukin 23 receptor | 0.01823 | 3 | 0 |
| NPEPPS | aminopeptidase puromycin sensitive | 0.00670 | 2 | 0 |
| CACNA1S | calcium channel, voltage-dependent, L type, alpha 1S subunit | 0.00605 | 2 | 0 |
| FCGR2A | Fc fragment of IgG, low affinity IIa, receptor (CD32) | 0.00557 | 2 | 0 |
| ICAM4 | intercellular adhesion molecule 4 (Landsteiner-Wiener blood group) | 0.00543 | 2 | 0 |
| FCGR2B | Fc fragment of IgG, low affinity IIb, receptor (CD32) | 0.00450 | 3 | 0 |
| ITGB3 | integrin, beta 3 (platelet glycoprotein IIIa, antigen CD61) | 0.00401 | 3 | 0 |
| TBX21 | T-box 21 | 0.00395 | 2 | 0 |
| NOS2 | nitric oxide synthase 2, inducible | 0.00388 | 3 | 0 |
The top 5 novel targets/genes are:
| Target.Symbol | Target.Name | priority | PS | Nominated |
|---|---|---|---|---|
| CAST | calpastatin | 0.08460 | 1 | 0 |
| ERAP2 | endoplasmic reticulum aminopeptidase 2 | 0.03626 | 0 | 0 |
| B3GNT2 | UDP-GlcNAc:betaGal beta-1,3-N-acetylglucosaminyltransferase 2 | 0.02077 | 0 | 0 |
| LNPEP | leucyl/cystinyl aminopeptidase | 0.01696 | 1 | 0 |
| DDX39B | DEAD (Asp-Glu-Ala-Asp) box polypeptide 39B | 0.01284 | 1 | 0 |
Label the gene network with annotation prediction
The nodes/genes in AS-specific gene network are color-coded by annotation additive predictor scores.
g <- AS_net
evi <- AS_combined[,3]
names(evi) <- rownames(AS_combined)
pattern <- evi[V(g)$name]
zmax <- max(pattern)
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="white-darkred", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))
Nodes/genes evidenced by an individual predictor and/or nominated by TPN members are highlighted in pink. Here, only nodes with additive predictor scores >=4 are labeled by gene symbols.
g <- AS_net
evi <- AS_combined[, c(4:8,12)]
data <- evi[V(g)$name,]
vertex.label <- V(g)$name
vertex.label[apply(data[,1:5],1,sum)<=3] <- ''
visNetMul(g, data=data, glayout=layout_(g, with_kk()), vertex.label=vertex.label, vertex.shape="sphere", colormap="white-magenta", newpage=F, colorbar=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, vertex.size=8, mtext.cex=0.7,border.color="black",zlim=c(0,1), mtext.side=3, mtext.adj=0)
Recall: SA genes prioritised by Pi are stored in the data frame SA_genes, and the data frame iPS for annotation predictor scores for all genes stored in iSymbol or iGenes (see above).
# find genes with annotation predictors
ind <- match(rownames(SA_genes), iSymbol)
genes_ap <- cbind(iPS[ind,], iGenes[ind,], Nominated=iNominated[ind])
genes_ap[is.na(genes_ap)] <- 0 # NA means no annotation predictors
# combine SA_genes iPS iGenes iNominated
SA_combined <- cbind(SA_genes[,c(1,4)], genes_ap)
The top 20 SA-specific genes along with annotation predictors and TPN nominations are listed below:
| name | priority | PS | OMIM | Phenotype | Function | Pathway | Druggable | Nominated |
|---|---|---|---|---|---|---|---|---|
| CAST | 0.02679 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
| DDX39B | 0.02230 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
| ATF6B | 0.01918 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| NFKBIL1 | 0.01806 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
| IER3 | 0.01174 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| ERAP2 | 0.01148 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| ERAP1 | 0.01037 | 2 | 0 | 0 | 1 | 1 | 0 | 0 |
| TRAF3IP2 | 0.01030 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| MDC1 | 0.00998 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| TNIP1 | 0.00967 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| ANXA6 | 0.00925 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| VARS2 | 0.00913 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| LCE3D | 0.00903 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| LCE3A | 0.00849 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| LCE3B | 0.00818 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| LCE3C | 0.00802 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| LCE3E | 0.00751 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| TYK2 | 0.00672 | 4 | 1 | 1 | 0 | 1 | 1 | 0 |
| B3GNT2 | 0.00657 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| CNPY2 | 0.00652 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Comparing genetic priority and annotation prediction
This violin plot displays density (the violin shape) of genes receiving SA-specific genetic priority versus annotation predictor scores (general indicator of immune-relatedness).
hline <- quantile(SA_combined$priority,0.95) # genetic priority at the top 5%
hline1 <- quantile(SA_combined$priority,1) # genetic priority at the top
p <- ggplot(SA_combined, aes(factor(PS), priority))
p + geom_violin(trim=T,adjust=0.5) + scale_y_log10() + ylab("Spondyloarthritis-specific genetic priority") + xlab("Annotation predictor scores (general indicator of immune-relatedness)") + theme_bw() +
theme(axis.title.y=element_text(size=14,color="blue"), axis.title.x=element_text(size=14,color="blue")) + geom_hline(yintercept=hline,color="darkgreen",linetype="dotdash") + geom_vline(xintercept=c(2,5)+0.5,color="transparent",linetype="dotdash") + geom_segment(aes(x=2.5,xend=2.5,y=hline,yend=hline1),color="red",linetype="dotdash") + geom_segment(aes(x=4.5,xend=4.5,y=hline,yend=hline1),color="red",linetype="dotdash")
A horizontal line is for genetic priority at the top 5%, and two vertical lines separate well-annotated genes, moderately annotated genes, and genes seldom annotated. Therefore genes can be divided into 4 areas:
easy targets/genes receiving high genetic priority and being well-annotated. They are as expected considering sufficient existing knowledge in support. The followup risk is low but the novelty is compromised.a2maid targets/genes receiving high genetic priority and being moderately annotated. They might represent attractive targets as both the risk and novelty are at the twilight zone.novel targets/genes receiving high genetic priority but with seldom annotations. Its great novelty comes the high risk to follow-up.The 49 easy targets/genes are:
| Target.Symbol | Target.Name | priority | PS | Nominated |
|---|---|---|---|---|
| TYK2 | tyrosine kinase 2 | 0.00672 | 4 | 0 |
| IL12B | interleukin 12B | 0.00518 | 5 | 0 |
| HLA-DQA1 | major histocompatibility complex, class II, DQ alpha 1 | 0.00392 | 4 | 0 |
| ICAM1 | intercellular adhesion molecule 1 | 0.00390 | 4 | 0 |
| IL6R | interleukin 6 receptor | 0.00243 | 5 | 0 |
| TNFRSF1A | tumor necrosis factor receptor superfamily, member 1A | 0.00233 | 4 | 1 |
| NFKBIA | nuclear factor of kappa light polypeptide gene enhancer in B-cells inhibitor, alpha | 0.00217 | 5 | 0 |
| DDX58 | DEAD (Asp-Glu-Ala-Asp) box polypeptide 58 | 0.00215 | 4 | 0 |
| CARD9 | caspase recruitment domain family, member 9 | 0.00194 | 4 | 0 |
| IFIH1 | interferon induced with helicase C domain 1 | 0.00114 | 4 | 0 |
| CD27 | CD27 molecule | 0.00105 | 4 | 0 |
| STAT3 | signal transducer and activator of transcription 3 (acute-phase response factor) | 0.00094 | 4 | 0 |
| NFKB1 | nuclear factor of kappa light polypeptide gene enhancer in B-cells 1 | 0.00091 | 4 | 0 |
| FCGR3A | Fc fragment of IgG, low affinity IIIa, receptor (CD16a) | 0.00089 | 4 | 0 |
| IL2RA | interleukin 2 receptor, alpha | 0.00085 | 5 | 1 |
| C4A | complement component 4A (Rodgers blood group) | 0.00056 | 4 | 0 |
| NCF4 | neutrophil cytosolic factor 4, 40kDa | 0.00055 | 4 | 0 |
| TNF | tumor necrosis factor | 0.00044 | 4 | 0 |
| CFB | complement factor B | 0.00042 | 5 | 0 |
| AIRE | autoimmune regulator | 0.00034 | 4 | 0 |
| MAPK1 | mitogen-activated protein kinase 1 | 0.00031 | 4 | 0 |
| ADAR | adenosine deaminase, RNA-specific | 0.00031 | 4 | 1 |
| PSMB8 | proteasome (prosome, macropain) subunit, beta type, 8 | 0.00030 | 5 | 0 |
| HLA-B | major histocompatibility complex, class I, B | 0.00027 | 4 | 0 |
| HLA-A | major histocompatibility complex, class I, A | 0.00025 | 5 | 0 |
| B2M | beta-2-microglobulin | 0.00024 | 5 | 0 |
| NOTCH1 | notch 1 | 0.00022 | 4 | 0 |
| IKBKG | inhibitor of kappa light polypeptide gene enhancer in B-cells, kinase gamma | 0.00022 | 5 | 0 |
| STAT5B | signal transducer and activator of transcription 5B | 0.00019 | 4 | 0 |
| PTPN11 | protein tyrosine phosphatase, non-receptor type 11 | 0.00019 | 4 | 0 |
| HLA-DQB1 | major histocompatibility complex, class II, DQ beta 1 | 0.00017 | 4 | 0 |
| CHUK | conserved helix-loop-helix ubiquitous kinase | 0.00015 | 4 | 0 |
| IL12A | interleukin 12A | 0.00014 | 4 | 0 |
| NFKB2 | nuclear factor of kappa light polypeptide gene enhancer in B-cells 2 (p49/p100) | 0.00013 | 5 | 0 |
| IKBKB | inhibitor of kappa light polypeptide gene enhancer in B-cells, kinase beta | 0.00013 | 5 | 0 |
| IFNG | interferon, gamma | 0.00013 | 4 | 0 |
| STAT4 | signal transducer and activator of transcription 4 | 0.00012 | 4 | 0 |
| JAK2 | Janus kinase 2 | 0.00012 | 5 | 0 |
| MALT1 | MALT1 paracaspase | 0.00012 | 4 | 0 |
| CD40 | CD40 molecule, TNF receptor superfamily member 5 | 0.00012 | 5 | 0 |
| ITGB2 | integrin, beta 2 (complement component 3 receptor 3 and 4 subunit) | 0.00011 | 5 | 0 |
| HLA-DRB1 | major histocompatibility complex, class II, DR beta 1 | 0.00011 | 4 | 0 |
| IL2 | interleukin 2 | 0.00011 | 4 | 0 |
| IL1B | interleukin 1, beta | 0.00011 | 4 | 0 |
| RAG1 | recombination activating gene 1 | 0.00010 | 4 | 0 |
| PRKDC | protein kinase, DNA-activated, catalytic polypeptide | 0.00010 | 4 | 1 |
| EGFR | epidermal growth factor receptor | 0.00009 | 4 | 0 |
| FOS | FBJ murine osteosarcoma viral oncogene homolog | 0.00009 | 4 | 0 |
| JAK3 | Janus kinase 3 | 0.00009 | 4 | 0 |
The top 10 a2maid targets/genes are:
| Target.Symbol | Target.Name | priority | PS | Nominated |
|---|---|---|---|---|
| ERAP1 | endoplasmic reticulum aminopeptidase 1 | 0.01037 | 2 | 0 |
| ICAM3 | intercellular adhesion molecule 3 | 0.00594 | 2 | 0 |
| IL23R | interleukin 23 receptor | 0.00584 | 3 | 0 |
| TNFAIP3 | tumor necrosis factor, alpha-induced protein 3 | 0.00537 | 2 | 1 |
| FYN | FYN proto-oncogene, Src family tyrosine kinase | 0.00450 | 3 | 0 |
| ICAM4 | intercellular adhesion molecule 4 (Landsteiner-Wiener blood group) | 0.00409 | 2 | 0 |
| HLA-E | major histocompatibility complex, class I, E | 0.00344 | 2 | 0 |
| NOS2 | nitric oxide synthase 2, inducible | 0.00296 | 3 | 0 |
| ITGAM | integrin, alpha M (complement component 3 receptor 3 subunit) | 0.00243 | 3 | 1 |
| NPEPPS | aminopeptidase puromycin sensitive | 0.00213 | 2 | 0 |
The top 5 novel targets/genes are:
| Target.Symbol | Target.Name | priority | PS | Nominated |
|---|---|---|---|---|
| CAST | calpastatin | 0.02679 | 1 | 0 |
| DDX39B | DEAD (Asp-Glu-Ala-Asp) box polypeptide 39B | 0.02230 | 1 | 0 |
| ATF6B | activating transcription factor 6 beta | 0.01918 | 0 | 0 |
| NFKBIL1 | nuclear factor of kappa light polypeptide gene enhancer in B-cells inhibitor-like 1 | 0.01806 | 1 | 0 |
| IER3 | immediate early response 3 | 0.01174 | 0 | 0 |
Label the gene network with annotation prediction
The nodes/genes in Spondyloarthritis-specific gene network are color-coded by annotation additive predictor scores.
g <- SA_net
evi <- SA_combined[,3]
names(evi) <- rownames(SA_combined)
pattern <- evi[V(g)$name]
zmax <- max(pattern)
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="white-darkred", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))
Nodes/genes evidenced by an individual predictor and/or nominated by TPN members are highlighted in pink. Here, only nodes with additive predictor scores >=4 are labeled by gene symbols.
g <- SA_net
evi <- SA_combined[, c(4:8,12)]
data <- evi[V(g)$name,]
vertex.label <- V(g)$name
vertex.label[apply(data[,1:5],1,sum)<=3] <- ''
visNetMul(g, data=data, glayout=layout_(g, with_kk()), vertex.label=vertex.label, vertex.shape="sphere", colormap="white-magenta", newpage=F, colorbar=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, vertex.size=8, mtext.cex=0.7,border.color="black",zlim=c(0,1), mtext.side=3, mtext.adj=0)
Recall: SLE genes prioritised by Pi are stored in the data frame SLE_genes, and the data frame iPS for annotation predictor scores for all genes stored in iSymbol or iGenes (see above).
# find genes with annotation predictors
ind <- match(rownames(SLE_genes), iSymbol)
genes_ap <- cbind(iPS[ind,], iGenes[ind,], Nominated=iNominated[ind])
genes_ap[is.na(genes_ap)] <- 0 # NA means no annotation predictors
# combine SLE_genes iPS iGenes iNominated
SLE_combined <- cbind(SLE_genes[,c(1,4)], genes_ap)
The top 20 SLE-specific genes along with annotation predictors and TPN nominations are listed below:
| name | priority | PS | OMIM | Phenotype | Function | Pathway | Druggable | Nominated |
|---|---|---|---|---|---|---|---|---|
| ATF6B | 0.04358 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| NELFE | 0.02335 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| VARS2 | 0.02029 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| ITGAM | 0.01884 | 3 | 1 | 1 | 1 | 0 | 0 | 1 |
| PBX2 | 0.01783 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| PYCARD | 0.01753 | 2 | 0 | 0 | 1 | 1 | 0 | 0 |
| CUEDC2 | 0.01746 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| STAT4 | 0.01707 | 4 | 1 | 1 | 0 | 1 | 1 | 0 |
| HLA-DPB1 | 0.01576 | 2 | 1 | 0 | 0 | 1 | 0 | 0 |
| IRF5 | 0.01540 | 3 | 1 | 1 | 0 | 1 | 0 | 1 |
| TNPO3 | 0.01533 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
| CLIC1 | 0.01527 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| PSMB9 | 0.01482 | 3 | 0 | 0 | 1 | 1 | 1 | 0 |
| RNF114 | 0.01427 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| C2 | 0.01354 | 2 | 0 | 0 | 1 | 1 | 0 | 0 |
| LTA | 0.01303 | 3 | 1 | 0 | 0 | 1 | 1 | 0 |
| SKIV2L | 0.01275 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| NCF2 | 0.01219 | 3 | 1 | 0 | 1 | 1 | 0 | 0 |
| HNRNPM | 0.01217 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| SMG7 | 0.01137 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Comparing genetic priority and annotation prediction
This violin plot displays density (the violin shape) of genes receiving SLE-specific genetic priority versus annotation predictor scores (general indicator of immune-relatedness).
hline <- quantile(SLE_combined$priority,0.95) # genetic priority at the top 5%
hline1 <- quantile(SLE_combined$priority,1) # genetic priority at the top
p <- ggplot(SLE_combined, aes(factor(PS), priority))
p + geom_violin(trim=T,adjust=0.5) + scale_y_log10() + ylab("SLE-specific genetic priority") + xlab("Annotation predictor scores (general indicator of immune-relatedness)") + theme_bw() +
theme(axis.title.y=element_text(size=14,color="blue"), axis.title.x=element_text(size=14,color="blue")) + geom_hline(yintercept=hline,color="darkgreen",linetype="dotdash") + geom_vline(xintercept=c(2,5)+0.5,color="transparent",linetype="dotdash") + geom_segment(aes(x=2.5,xend=2.5,y=hline,yend=hline1),color="red",linetype="dotdash") + geom_segment(aes(x=4.5,xend=4.5,y=hline,yend=hline1),color="red",linetype="dotdash")
A horizontal line is for genetic priority at the top 5%, and two vertical lines separate well-annotated genes, moderately annotated genes, and genes seldom annotated. Therefore genes can be divided into 4 areas:
easy targets/genes receiving high genetic priority and being well-annotated. They are as expected considering sufficient existing knowledge in support. The followup risk is low but the novelty is compromised.a2maid targets/genes receiving high genetic priority and being moderately annotated. They might represent attractive targets as both the risk and novelty are at the twilight zone.novel targets/genes receiving high genetic priority but with seldom annotations. Its great novelty comes the high risk to follow-up.The 47 easy targets/genes are:
| Target.Symbol | Target.Name | priority | PS | Nominated |
|---|---|---|---|---|
| STAT4 | signal transducer and activator of transcription 4 | 0.01707 | 4 | 0 |
| C4A | complement component 4A (Rodgers blood group) | 0.00709 | 4 | 0 |
| HLA-DQA1 | major histocompatibility complex, class II, DQ alpha 1 | 0.00570 | 4 | 0 |
| PTPN22 | protein tyrosine phosphatase, non-receptor type 22 (lymphoid) | 0.00365 | 4 | 2 |
| TYK2 | tyrosine kinase 2 | 0.00217 | 4 | 0 |
| CFB | complement factor B | 0.00208 | 5 | 0 |
| IRF8 | interferon regulatory factor 8 | 0.00172 | 4 | 0 |
| IFIH1 | interferon induced with helicase C domain 1 | 0.00157 | 4 | 0 |
| IRF7 | interferon regulatory factor 7 | 0.00139 | 4 | 0 |
| CDKN1B | cyclin-dependent kinase inhibitor 1B (p27, Kip1) | 0.00123 | 4 | 0 |
| TNF | tumor necrosis factor | 0.00113 | 4 | 0 |
| HRAS | Harvey rat sarcoma viral oncogene homolog | 0.00107 | 4 | 0 |
| IL10 | interleukin 10 | 0.00093 | 5 | 0 |
| FCGR3A | Fc fragment of IgG, low affinity IIIa, receptor (CD16a) | 0.00084 | 4 | 0 |
| PNP | purine nucleoside phosphorylase | 0.00081 | 4 | 0 |
| IL12A | interleukin 12A | 0.00071 | 4 | 0 |
| ICAM1 | intercellular adhesion molecule 1 | 0.00065 | 4 | 0 |
| MAPK1 | mitogen-activated protein kinase 1 | 0.00042 | 4 | 0 |
| HLA-DQB1 | major histocompatibility complex, class II, DQ beta 1 | 0.00030 | 4 | 0 |
| CHUK | conserved helix-loop-helix ubiquitous kinase | 0.00028 | 4 | 0 |
| CARD9 | caspase recruitment domain family, member 9 | 0.00026 | 4 | 0 |
| MASP2 | mannan-binding lectin serine peptidase 2 | 0.00026 | 5 | 0 |
| CIITA | class II, major histocompatibility complex, transactivator | 0.00024 | 4 | 1 |
| C1QA | complement component 1, q subcomponent, A chain | 0.00021 | 4 | 0 |
| HLA-DRB1 | major histocompatibility complex, class II, DR beta 1 | 0.00020 | 4 | 0 |
| BCL2 | B-cell CLL/lymphoma 2 | 0.00020 | 5 | 0 |
| CYBB | cytochrome b-245, beta polypeptide | 0.00020 | 5 | 0 |
| TNFRSF1A | tumor necrosis factor receptor superfamily, member 1A | 0.00019 | 4 | 1 |
| PSTPIP1 | proline-serine-threonine phosphatase interacting protein 1 | 0.00019 | 4 | 0 |
| MEFV | Mediterranean fever | 0.00018 | 4 | 0 |
| IKBKG | inhibitor of kappa light polypeptide gene enhancer in B-cells, kinase gamma | 0.00017 | 5 | 0 |
| NLRP3 | NLR family, pyrin domain containing 3 | 0.00016 | 4 | 1 |
| NFKB1 | nuclear factor of kappa light polypeptide gene enhancer in B-cells 1 | 0.00014 | 4 | 0 |
| MYD88 | myeloid differentiation primary response 88 | 0.00014 | 4 | 0 |
| IL12B | interleukin 12B | 0.00013 | 5 | 0 |
| IL1B | interleukin 1, beta | 0.00013 | 4 | 0 |
| ITGB2 | integrin, beta 2 (complement component 3 receptor 3 and 4 subunit) | 0.00013 | 5 | 0 |
| PDGFRA | platelet-derived growth factor receptor, alpha polypeptide | 0.00013 | 4 | 0 |
| CREBBP | CREB binding protein | 0.00012 | 4 | 1 |
| C3 | complement component 3 | 0.00012 | 5 | 0 |
| PSMB8 | proteasome (prosome, macropain) subunit, beta type, 8 | 0.00012 | 5 | 0 |
| IL2RA | interleukin 2 receptor, alpha | 0.00012 | 5 | 1 |
| PRKDC | protein kinase, DNA-activated, catalytic polypeptide | 0.00012 | 4 | 1 |
| IFNG | interferon, gamma | 0.00011 | 4 | 0 |
| IKBKB | inhibitor of kappa light polypeptide gene enhancer in B-cells, kinase beta | 0.00011 | 5 | 0 |
| ITK | IL2-inducible T-cell kinase | 0.00010 | 5 | 0 |
| TLR2 | toll-like receptor 2 | 0.00010 | 4 | 0 |
The top 10 a2maid targets/genes are:
| Target.Symbol | Target.Name | priority | PS | Nominated |
|---|---|---|---|---|
| ITGAM | integrin, alpha M (complement component 3 receptor 3 subunit) | 0.01884 | 3 | 1 |
| PYCARD | PYD and CARD domain containing | 0.01753 | 2 | 0 |
| HLA-DPB1 | major histocompatibility complex, class II, DP beta 1 | 0.01576 | 2 | 0 |
| IRF5 | interferon regulatory factor 5 | 0.01540 | 3 | 1 |
| PSMB9 | proteasome (prosome, macropain) subunit, beta type, 9 | 0.01482 | 3 | 0 |
| C2 | complement component 2 | 0.01354 | 2 | 0 |
| LTA | lymphotoxin alpha | 0.01303 | 3 | 0 |
| NCF2 | neutrophil cytosolic factor 2 | 0.01219 | 3 | 0 |
| STAT1 | signal transducer and activator of transcription 1, 91kDa | 0.00620 | 3 | 0 |
| TNFAIP3 | tumor necrosis factor, alpha-induced protein 3 | 0.00352 | 2 | 1 |
The top 5 novel targets/genes are:
| Target.Symbol | Target.Name | priority | PS | Nominated |
|---|---|---|---|---|
| ATF6B | activating transcription factor 6 beta | 0.04358 | 0 | 0 |
| NELFE | negative elongation factor complex member E | 0.02335 | 0 | 0 |
| VARS2 | valyl-tRNA synthetase 2, mitochondrial | 0.02029 | 0 | 1 |
| PBX2 | pre-B-cell leukemia homeobox 2 | 0.01783 | 0 | 0 |
| CUEDC2 | CUE domain containing 2 | 0.01746 | 0 | 0 |
Label the gene network with annotation prediction
The nodes/genes in SLE-specific gene network are color-coded by annotation additive predictor scores.
g <- SLE_net
evi <- SLE_combined[,3]
names(evi) <- rownames(SLE_combined)
pattern <- evi[V(g)$name]
zmax <- max(pattern)
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="white-darkred", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))
Nodes/genes evidenced by an individual predictor and/or nominated by TPN members are highlighted in pink. Here, only nodes with additive predictor scores >=4 are labeled by gene symbols.
g <- SLE_net
evi <- SLE_combined[, c(4:8,12)]
data <- evi[V(g)$name,]
vertex.label <- V(g)$name
vertex.label[apply(data[,1:5],1,sum)<=3] <- ''
visNetMul(g, data=data, glayout=layout_(g, with_kk()), vertex.label=vertex.label, vertex.shape="sphere", colormap="white-magenta", newpage=F, colorbar=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, vertex.size=8, mtext.cex=0.7,border.color="black",zlim=c(0,1), mtext.side=3, mtext.adj=0)
Recall: disease-centric genetic prioritisation for genes by Pi are stored respectively in three data frames, that is, AS_genes for Ankylosing Spondylitis, SA_genes for Spondyloarthritis and SLE_genes for Systemic Lupus Erythematosus as exemplars. The data frame iPS stores annotation predictor scores for all genes, the vector iNominated for nominated genes, and iSymbol or iGenes for gene info.
# Ankylosing Spondylitis (AS)
ind <- match(iSymbol, AS_genes$name)
df_AS <- AS_genes[ind,]
# Spondyloarthritis (SA)
ind <- match(iSymbol, SA_genes$name)
df_SA <- SA_genes[ind,]
# Systemic Lupus Erythematosus (SLE)
ind <- match(iSymbol, SLE_genes$name)
df_SLE <- SLE_genes[ind,]
# Combine into a data frame 'df_rank' and `df_priority`
## rank
df_rank <- cbind(AS=df_AS$rank, SA=df_SA$rank, SLE=df_SLE$rank)
## priority
df_priority <- cbind(AS=df_AS$priority, SA=df_SA$priority, SLE=df_SLE$priority)
rownames(df_rank) <- rownames(df_priority) <- iSymbol
Venn diagram of the top 200 genes across diseases
ntop <- 200
data <- list()
data$AS <- iSymbol[which(df_rank[,1] <= ntop)]
data$SA <- iSymbol[which(df_rank[,2] <= ntop)]
data$SLE <- iSymbol[which(df_rank[,3] <= ntop)]
category.names <- c('Ankylosing Spondylitis (AS)', 'Spondyloarthritis (SA)', 'Systemic Lupus Erythematosus (SLE)')
vp <- venn.diagram(x=data, filename=NULL, fill=c("skyblue","pink1","mediumorchid"), category.names=names(data), margin=0.15)
grid.draw(vp)
The 14 genes common to three diseases (along with annotation additive predictor scores and how many times being nominated by TPN members) are listed below:
| Target.Symbol | Target.Name | Annotated | Nominated |
|---|---|---|---|
| UBE2L3 | ubiquitin-conjugating enzyme E2L 3 | 1 | 1 |
| IFIH1 | interferon induced with helicase C domain 1 | 4 | 0 |
| TYK2 | tyrosine kinase 2 | 4 | 0 |
| FCGR2B | Fc fragment of IgG, low affinity IIb, receptor (CD32) | 3 | 0 |
| FCGR2A | Fc fragment of IgG, low affinity IIa, receptor (CD32) | 2 | 0 |
| ICAM3 | intercellular adhesion molecule 3 | 2 | 0 |
| DDX39B | DEAD (Asp-Glu-Ala-Asp) box polypeptide 39B | 1 | 0 |
| LTBR | lymphotoxin beta receptor (TNFR superfamily, member 3) | 1 | 0 |
| NFKBIL1 | nuclear factor of kappa light polypeptide gene enhancer in B-cells inhibitor-like 1 | 1 | 0 |
| CDC37 | cell division cycle 37 | 0 | 0 |
| FDX1L | ferredoxin 1-like | 0 | 0 |
| HSPA6 | heat shock 70kDa protein 6 (HSP70B’) | 0 | 0 |
| RAVER1 | ribonucleoprotein, PTB-binding 1 | 0 | 0 |
| RLTPR | RGD motif, leucine rich repeats, tropomodulin domain and proline-rich containing | 0 | 0 |
Venn diagram of the top 200 genes in any diseases vs 714 TPN nominated genes
ntop <- 200
data <- list()
data$AS <- iSymbol[which(df_rank[,1] <= ntop)]
data$SA <- iSymbol[which(df_rank[,2] <= ntop)]
data$SLE <- iSymbol[which(df_rank[,3] <= ntop)]
data$Prioritised <- base::Reduce(union, data[1:3])
data$Nominated <- iSymbol[which(iNominated>0)]
category.names <- c('Prioritised', 'Nominated')
vp <- venn.diagram(x=data[4:5], filename=NULL, fill=c("red","orange"), category.names=category.names, margin=0.15, cat.pos=1)
grid.draw(vp)
The 30 genes common to disease-specific genetic prioritisations and TPN nominations (along with annotation additive predictor scores and how many times being nominated by TPN members) are listed below:
| Target.Symbol | Target.Name | Annotated | Nominated |
|---|---|---|---|
| PXK | PX domain containing serine/threonine kinase | 1 | 4 |
| PTPN22 | protein tyrosine phosphatase, non-receptor type 22 (lymphoid) | 4 | 2 |
| IL2RA | interleukin 2 receptor, alpha | 5 | 1 |
| TNFRSF1A | tumor necrosis factor receptor superfamily, member 1A | 4 | 1 |
| BLK | BLK proto-oncogene, Src family tyrosine kinase | 3 | 1 |
| IRF5 | interferon regulatory factor 5 | 3 | 1 |
| ITGAM | integrin, alpha M (complement component 3 receptor 3 subunit) | 3 | 1 |
| TNFAIP3 | tumor necrosis factor, alpha-induced protein 3 | 2 | 1 |
| BANK1 | B-cell scaffold protein with ankyrin repeats 1 | 1 | 1 |
| GPR65 | G protein-coupled receptor 65 | 1 | 1 |
| IL23A | interleukin 23, alpha subunit p19 | 1 | 1 |
| KAT8 | K(lysine) acetyltransferase 8 | 1 | 1 |
| SCNN1A | sodium channel, non voltage gated 1 alpha subunit | 1 | 1 |
| SLC11A1 | solute carrier family 11 (proton-coupled divalent metal ion transporter), member 1 | 1 | 1 |
| SLC15A4 | solute carrier family 15 (oligopeptide transporter), member 4 | 1 | 1 |
| SPHK2 | sphingosine kinase 2 | 1 | 1 |
| UBE2L3 | ubiquitin-conjugating enzyme E2L 3 | 1 | 1 |
| APOBEC4 | apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like 4 (putative) | 0 | 1 |
| BRWD1 | bromodomain and WD repeat domain containing 1 | 0 | 1 |
| CCDC101 | coiled-coil domain containing 101 | 0 | 1 |
| MBTPS2 | membrane-bound transcription factor peptidase, site 2 | 0 | 1 |
| ORAI3 | ORAI calcium release-activated calcium modulator 3 | 0 | 1 |
| PARP11 | poly (ADP-ribose) polymerase family, member 11 | 0 | 1 |
| RAB5A | RAB5A, member RAS oncogene family | 0 | 1 |
| RAB5B | RAB5B, member RAS oncogene family | 0 | 1 |
| RIOK2 | RIO kinase 2 | 0 | 1 |
| SLC9A8 | solute carrier family 9, subfamily A (NHE8, cation proton antiporter 8), member 8 | 0 | 1 |
| SMPD2 | sphingomyelin phosphodiesterase 2, neutral membrane (neutral sphingomyelinase) | 0 | 1 |
| TET3 | tet methylcytosine dioxygenase 3 | 0 | 1 |
| VARS2 | valyl-tRNA synthetase 2, mitochondrial | 0 | 1 |
Recall: disease-centric genetic prioritisation for pathways by Pi are stored respectively in three data frames, that is, AS_pathways for Ankylosing Spondylitis, SA_pathways for Spondyloarthritis and SLE_pathways for Systemic Lupus Erythematosus as exemplars.
Load all pathways into a data frame iPathway:
org.Hs.egMsigdbC2CPall <- xRDataLoader(RData.customised='org.Hs.egMsigdbC2CPall', RData.location=RData.location)
iPathway <- org.Hs.egMsigdbC2CPall$set_info[, c(1,2)]
# Ankylosing Spondylitis (AS)
ind <- match(rownames(iPathway), rownames(AS_pathways))
df_AS <- AS_pathways[ind,]
# Spondyloarthritis (SA)
ind <- match(rownames(iPathway), rownames(SA_pathways))
df_SA <- SA_pathways[ind,]
# Systemic Lupus Erythematosus (SLE)
ind <- match(rownames(iPathway), rownames(SLE_pathways))
df_SLE <- SLE_pathways[ind,]
# Combine into a data frame 'df_adjp'
## adjusted p-values
df_adjp <- cbind(AS=df_AS$adjp, SA=df_SA$adjp, SLE=df_SLE$adjp)
rownames(df_adjp) <- iPathway$name
This barplot displays prioritised pathways for three diseases, in which horizontal lines are used to indicate three groups of pathways prioritised:
IL23-mediated signaling events, IL27-mediated signaling events.Recall: the gene network identified by Pi are stored respectively in three data frames, that is, AS_net for Ankylosing Spondylitis, SA_net for Spondyloarthritis and SLE_net for Systemic Lupus Erythematosus as exemplars. The data frame iPS stores annotation predictor scores for all genes, the vector iNominated for nominated genes, and iSymbol or iGenes for gene info.
# Ankylosing Spondylitis (AS)
ind <- match(iSymbol, V(AS_net)$name)
df_AS <- V(AS_net)$name[ind]
# Spondyloarthritis (SA)
ind <- match(iSymbol, V(SA_net)$name)
df_SA <- V(SA_net)$name[ind]
# Systemic Lupus Erythematosus (SLE)
ind <- match(iSymbol, V(SLE_net)$name)
df_SLE <- V(SLE_net)$name[ind]
# Combine into a data frame 'df_net'
df_net <- cbind(AS=df_AS, SA=df_SA, SLE=df_SLE)
rownames(df_net) <- iSymbol
Venn diagram of network genes across diseases
data <- list()
data$AS <- iSymbol[!is.na(df_net[,1])]
data$SA <- iSymbol[!is.na(df_net[,2])]
data$SLE <- iSymbol[!is.na(df_net[,3])]
category.names <- c('Ankylosing Spondylitis (AS)', 'Spondyloarthritis (SA)', 'Systemic Lupus Erythematosus (SLE)')
vp <- venn.diagram(x=data, filename=NULL, fill=c("skyblue","pink1","mediumorchid"), category.names=names(data), margin=0.2)
grid.draw(vp)
The common part of the gene network shared by AS, SA and SLE
# identify common parts of SLE-network and AS-network.
net_AS_SA_SLE <- graph.intersection(AS_net, SA_net, SLE_net, keep.all.vertices=F)
# append genes with info about annotation predictors
g <- net_AS_SA_SLE
ind <- match(V(g)$name, iSymbol)
genes_ap <- cbind(iPS[ind,], iGenes[ind,], Nominated=iNominated[ind])
genes_ap[is.na(genes_ap)] <- 0 # NA means no annotation predictors
# combine iGenes iPS iNominated
g_info <- genes_ap[,c(8:9,1,10)]
colnames(g_info)[3] <- 'Annotated'
rownames(g_info) <- g_info[,1]
The 3 genes common to three diseases (along with annotation additive predictor scores and how many times being nominated by TPN members) are listed below:
| Target.Symbol | Target.Name | Annotated | Nominated |
|---|---|---|---|
| TYK2 | tyrosine kinase 2 | 4 | 0 |
| UBA52 | ubiquitin A-52 residue ribosomal protein fusion product 1 | 2 | 0 |
| CDC37 | cell division cycle 37 | 0 | 0 |
The network nodes/genes shared by three diseases are color-coded by annotation additive predictor scores.
# visualise intersected network part with nodes/genes color-encoded by annotation predictors
evi <- g_info[,3]
names(evi) <- rownames(g_info)
pattern <- evi[V(g)$name]
zmax <- max(pattern)
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="white-darkred", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))
The common part of the gene network shared by AS and SA
# identify common parts of AS-network and SA-network.
net_AS_SA <- graph.intersection(AS_net, SA_net, keep.all.vertices=F)
# append genes with info about annotation predictors
g <- net_AS_SA
ind <- match(V(g)$name, iSymbol)
genes_ap <- cbind(iPS[ind,], iGenes[ind,], Nominated=iNominated[ind])
genes_ap[is.na(genes_ap)] <- 0 # NA means no annotation predictors
# combine iGenes iPS iNominated
g_info <- genes_ap[,c(8:9,1,10)]
colnames(g_info)[3] <- 'Annotated'
rownames(g_info) <- g_info[,1]
The common 21 genes along with annotation predictors and TPN nominations are listed below:
| Target.Symbol | Target.Name | Annotated | Nominated |
|---|---|---|---|
| TNFRSF1A | tumor necrosis factor receptor superfamily, member 1A | 4 | 1 |
| SCNN1A | sodium channel, non voltage gated 1 alpha subunit | 1 | 1 |
| IL12B | interleukin 12B | 5 | 0 |
| IL6R | interleukin 6 receptor | 5 | 0 |
| ICAM1 | intercellular adhesion molecule 1 | 4 | 0 |
| TYK2 | tyrosine kinase 2 | 4 | 0 |
| IL23R | interleukin 23 receptor | 3 | 0 |
| ITGB3 | integrin, beta 3 (platelet glycoprotein IIIa, antigen CD61) | 3 | 0 |
| NOS2 | nitric oxide synthase 2, inducible | 3 | 0 |
| ERAP1 | endoplasmic reticulum aminopeptidase 1 | 2 | 0 |
| ICAM3 | intercellular adhesion molecule 3 | 2 | 0 |
| ICAM4 | intercellular adhesion molecule 4 (Landsteiner-Wiener blood group) | 2 | 0 |
| NPEPPS | aminopeptidase puromycin sensitive | 2 | 0 |
| TBX21 | T-box 21 | 2 | 0 |
| UBA52 | ubiquitin A-52 residue ribosomal protein fusion product 1 | 2 | 0 |
| CAST | calpastatin | 1 | 0 |
| IL12RB2 | interleukin 12 receptor, beta 2 | 1 | 0 |
| LNPEP | leucyl/cystinyl aminopeptidase | 1 | 0 |
| CDC37 | cell division cycle 37 | 0 | 0 |
| MRPL4 | mitochondrial ribosomal protein L4 | 0 | 0 |
| RUNX3 | runt-related transcription factor 3 | 0 | 0 |
The network nodes/genes shared by AS and SA are color-coded by annotation additive predictor scores.
# visualise intersected network part with nodes/genes color-encoded by annotation predictors
evi <- g_info[,3]
names(evi) <- rownames(g_info)
pattern <- evi[V(g)$name]
zmax <- max(pattern)
xVisNet(g, pattern=pattern, glayout=layout_(g, with_kk()), vertex.shape="sphere", colormap="white-darkred", newpage=F, edge.arrow.size=0.3, vertex.label.color="blue", vertex.label.dist=0.35, vertex.label.font=2, zlim=c(0,zmax))
The common part of the gene network shared by SLE and SA
The common 10 genes along with annotation predictors and TPN nominations are listed below:
| Target.Symbol | Target.Name | Annotated | Nominated |
|---|---|---|---|
| ITGAM | integrin, alpha M (complement component 3 receptor 3 subunit) | 3 | 1 |
| TYK2 | tyrosine kinase 2 | 4 | 0 |
| FCGR2A | Fc fragment of IgG, low affinity IIa, receptor (CD32) | 2 | 0 |
| UBA52 | ubiquitin A-52 residue ribosomal protein fusion product 1 | 2 | 0 |
| ETS1 | v-ets avian erythroblastosis virus E26 oncogene homolog 1 | 1 | 0 |
| SQSTM1 | sequestosome 1 | 1 | 0 |
| TRIM56 | tripartite motif containing 56 | 1 | 0 |
| CDC37 | cell division cycle 37 | 0 | 0 |
| ITGAX | integrin, alpha X (complement component 3 receptor 4 subunit) | 0 | 0 |
| RPL34 | ribosomal protein L34 | 0 | 0 |
The network nodes/genes shared by SLE and SA are color-coded by annotation additive predictor scores.
The common part of the gene network shared by SLE and AS
The common 4 genes along with annotation predictors and TPN nominations are listed below:
| Target.Symbol | Target.Name | Annotated | Nominated |
|---|---|---|---|
| TYK2 | tyrosine kinase 2 | 4 | 0 |
| UBA52 | ubiquitin A-52 residue ribosomal protein fusion product 1 | 2 | 0 |
| SOCS1 | suppressor of cytokine signaling 1 | 1 | 0 |
| CDC37 | cell division cycle 37 | 0 | 0 |
The network nodes/genes shared by SLE and AS are color-coded by annotation additive predictor scores.
Below is a list of references that this work stands on:
1000 Genomes Project Consortium. 2012. “An integrated map of genetic variation from 1,092 human genomes.” Nature 135 (V): 0–9. doi:10.1038/nature11632.
Ashburner, M, C A Ball, J A Blake, D Botstein, H Butler, J M Cherry, A P Davis, et al. 2000. “Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.” Nat Genet 25 (1): 25–29. doi:10.1038/75556.
Cerami, E. G., B. E. Gross, E. Demir, I. Rodchenkov, O. Babur, N. Anwar, N. Schultz, G. D. Bader, and C. Sander. 2011. “Pathway Commons, a web resource for biological pathway data.” Nucleic Acids Research 39 (Database): D685–D690. doi:10.1093/nar/gkq1039.
Fairfax, Benjamin P, Peter Humburg, Seiko Makino, Vivek Naranbhai, Daniel Wong, Evelyn Lau, Luke Jostins, et al. 2014. “Innate immune activity conditions the effect of regulatory variants upon monocyte gene expression.” Science (New York, N.Y.) 343 (March): 1246949. doi:10.1126/science.1246949.
Fang, Hai, and Julian Gough. 2014. “The ’dnet’ approach promotes emerging research on cancer patient survival.” Genome Medicine 6 (8): 64. doi:10.1186/s13073-014-0064-8.
Köhler, Sebastian, Sandra C Doelken, Christopher J Mungall, Sebastian Bauer, Helen V Firth, Isabelle Bailleul-Forestier, Graeme C M Black, et al. 2013. “The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data.” Nucleic Acids Research, November, 1–9. doi:10.1093/nar/gkt1026.
Schriml, L M, C Arze, S Nadendla, Y W Chang, M Mazaitis, V Felix, G Feng, and W A Kibbe. 2012. “Disease Ontology: a backbone for disease semantic integration.” Nucleic Acids Res 40 (Database issue): D940–6. doi:10.1093/nar/gkr972.
Smith, C L, and J T Eppig. 2009. “The Mammalian Phenotype Ontology: enabling robust annotation and comparative analysis.” Wiley Interdiscip Rev Syst Biol Med 1 (3): 390–99. doi:10.1002/wsbm.44.
Szklarczyk, Damian, Andrea Franceschini, Stefan Wyder, Kristoffer Forslund, Davide Heller, Jaime Huerta-cepas, Milan Simonovic, et al. 2015. “STRING v10 : protein – protein interaction networks , integrated over the tree of life.” Nucleic Acids Res 43 (Database): D447–D452. doi:10.1093/nar/gku1003.
Welter, Danielle, Jacqueline MacArthur, Joannella Morales, Tony Burdett, Peggy Hall, Heather Junkins, Alan Klemm, et al. 2014. “The NHGRI GWAS Catalog, a curated resource of SNP-trait associations.” Nucleic Acids Research 42 (D1): 1001–6. doi:10.1093/nar/gkt1229.
The wording a2maid comes from the mermaid (describing the maid of the sea having a fish’s tail), and instead of the maid of the sea, replacing with a2 turns to be the maid of our lab having two PhD students Anna Sanniti and Alicia Lledo Lara.↩